1. Field of the Invention
This invention relates to monitoring of computer systems and more particularly to rebuilding the state of a computer system based on diagnostic data from the computer system.
2. Description of the Related Art
Computer systems such as mainframes, minicomputers, workstations and personal computers, experience hardware and software failures that degrade system performance or render the system inoperative. In order to diagnose such failures computer systems include diagnostic capability which provides various types of system diagnostic information.
Computer systems are typically serviced when a failure is noticed either by system diagnostics or by users of the system when the system become partially or completely inoperative. Since computer systems are frequently located at some distance from the support engineers, when problems do occur, a support engineer may access the computer system remotely through a modem in an interactive manner to evaluate the state of the computer system. That remote dial-in approach does allow the support engineer to provide assistance to a remote customer without the delay of traveling to the computer system site. Once connected to the remote computer system, the support engineer can perform such tasks as analyzing hardware and software faults by checking patch status, analyzing message files, checking configurations of add-on hardware, unbundled software, and networking products, uploading patches to the customer system in emergency situations, helping with problematic installation of additional software, running on-line diagnostics to help analyze hardware failures, copying files to or from customer system as needed.
However, there are limitations to such support. For instance, the data size transfer may be limited at the time of failure, due to such factors as modem speed and thus a complete picture of a system may be unavailable. Running diagnostic software during the remote session, if necessary, may adversely impact system performance. Where a system is part of a network, which is commonplace today, the running of diagnostic tests may impact network performance. Where computer systems are being used in a production or other realtime environment, such degradation of system performance is obviously undesirable.
Further, historical data on system performance may not be available in such scenarios. It is therefore impossible to analyze trends or compare system performance, e.g., before and after a new hardware or software change was made to the system. The support engineer is limited to the snapshot of the system based on the diagnostic information available when the support engineer dials in to the system.
It would be advantageous if a support engineer had available complete diagnostic information rather than just a snapshot, However, system diagnostic tests typically generate a significant amount of data and it can be difficult for a support engineer to analyze such data in a raw form. Additionally, service centers typically support a number of different computer systems. Each computer system has its own hardware and software components and thus have unique problems. For example, it is not uncommon for failures to be caused by incorrect or incompatible configuration of the various hardware and/or software components of the particular system. It would be advantageous to provide a remote monitoring diagnostic system that could process, present and manipulate diagnostic data in a structured and organized form and also monitor a number of different computer systems without having prior knowledge of the particular hardware or software configuration of each system being monitored. In order to provide better diagnostic support to computer systems, it would also be advantageous to provide the ability to detect problems in the diagnostic data and to provide proactive monitoring of the diagnostic data in order to better detect and/or predict system problems.