1. Field of the Invention
The present invention relates to a technology of collecting error information (dump information) when an error occurs in a computer system having a CPU in the computer and various process cards.
2. Description of the Related Art
In a computer system having a CPU in a computer (main board, etc.) and various process cards, the configuration of a device control function performed as firmware, a program, etc. in the CPU in the computer and in the process cards, may depend on the performance requested to a system.
That is, specific device control functions are performed by the CPU of a computer or through process cards depending on the functions requested to the system.
In this case, if an error (or an abnormality) occurs in any device control function, or if the device control function abnormally terminates, then the device control function or a function of controlling the device control function detects the error, and an error message is transmitted to a display device, etc. through a monitor routine (or an operating system). When a common user receives the error message, he or she normally requests a system engineer or an operator to remove an error condition.
According to the error message received, the operator is normally able to specify a cause of an error by collecting the operation conditions of the device control function when the error occurred. Practically, the operator collects dump information, which is an execution image in a memory space of the device control function, by issuing a dump collection command to a monitor routine, and sequentially analyzes the contents. When a dump collection command is issued, it is normally necessary to specify a target of a dump collection, that is, a CPU in a computer, the computer means body of the computer apparatus which any process card is provided for or inserted to, or any process card according to the above described example.
The error message received normally contains the type (name, etc.) of the device control function in which an error has occurred, the contents of the error, the address at which the error has occurred, etc. However, the message does not contain information as to whether the device control function in which the error has occurred is performed by the CPU in the computer or by any of the process cards.
Therefore, the operator conventionally determines whether the device control function in which the error has occurred is performed by the CPU in the computer or by any of the process cards, according to the limited information provided by an error message, the descriptions in a manual, specification, etc., and based on the operator's experience only, issues a dump collection command to the determined CPU of a computer or a process card, and has to extract the portion corresponding to the device control function from the collected dump information.
However, since identification of the portion where the device control function performed when an error occurs and extraction of the corresponding portion need experience, these processes cannot be performed by all users. Therefore, the above described conventional technology has the problem that operators are limited to those who can analyze and restore a system from an error.
It is not reasonable for the following grounds to include in a transmitted error message the information about the position where the device control function should be performed.
First, an error detection routine becomes complicated when the routine is assigned the function of transmitting the information about the position where the corresponding device control function should be performed, thereby over-utilizing the system resources and lowering the system performance.
Second, even if an error message contains information about the position where the corresponding device control function should be performed, one of a plurality of dump collection commands should be selected depending on whether the target of dump collection is a CPU in a computer or various process cards. As a result, an experienced operator is required to collect appropriate dump data. If an inexperienced operator collects dump data, to prevent the operator from collecting the inappropriate dump information or insufficiency dump information, it is necessary to collect all dump information after all.
If a common user, etc. can collect dump information according to a simple command correctly corresponding to the device control function in which an error has occurred, then an experienced operator is not necessarily required to be dispatched, by transmitting or transferring, for example, the collected dump information to a support center. However, for the above described grounds, it is conventionally difficult to appropriately specify a process card or a CPU in a computer by which a device control function is performed, and to collect only the dump information appropriately corresponding to the device control function.