1) Field of the Invention
The present invention relates to an information processing apparatus having a plurality of computing units and an error detection method for the information processing apparatus, and more particularly, to an error detection method in a large-scale information processing apparatus.
2) Description of the Related Art
In order to meet a demand for enhancement of performance of an information processing apparatus (such as a computer system), an information processing apparatus that carries out processing allowing a plurality of computing units to cooperate with one another is currently in use, such as a computer system in which a plurality of function boards to realize predetermined functions are connected to one another, and a computer system provided with a plurality of processors.
In such an information processing apparatus having a plurality of computing units, it is necessary that a notice of error be executed to other computing units at a time of error occurrence, and that a control be shifted to an error analysis processing at an early stage.
For example, Japanese Patent Application Laid-Open Publication No. 1995-219812 discloses a failure monitoring system to give a notice of error in a multiprocessor system in which a plurality of function boards are connected to one another by a system bus without the use of an interrupt function. Japanese Patent Application Laid-Open Publication No. 1993-224964 discloses a bus failure notifying system that notifies information of a bus abnormality occurring on the common bus.
Furthermore, Japanese Patent Application Laid-Open Publication No. 2002-91799 discloses a condition monitoring system provided with a board exclusive to monitoring of error in an information processing apparatus having a plurality of function boards, and Japanese Patent Application Laid-Open Publication No. 1985-63641 discloses an error processing circuit of a computer system.
Moreover, Japanese Patent Application Laid-Open Publication No. 1982-101954 discloses an error notifying system of a logical unit; Japanese Patent Application Laid-Open Publication No. 1995-200460 discloses a notifying system of interrupt at the time of error occurrence; and Japanese Patent Application Laid-Open Publication No. 1993-265812 discloses an information processing apparatus provided with a micro diagnostic device.
Japanese Patent Application Laid-Open Publication No. 2003-114811 discloses an automatic failure-recovery method and system, and an automatic failure-recovery apparatus and program that are provided with a board exclusive to monitoring of errors, and Japanese Patent Application Laid-Open Publication No. 1993-282167 discloses a processing method for failure occurring in an information processing apparatus.
Still further, Japanese Patent Application Laid-Open Publication No. 1989-295344 discloses a data collection method for failure occurring in an information processing apparatus, and Japanese Patent Application Laid-Open Publication No. 1987-1040 discloses failure analysis on a computer provided exclusively with a board to monitor errors.
Japanese Patent Application Laid-Open Publication No. 1998-91543 discloses a recording method for failure information in an information processing apparatus, and Japanese Patent Application Laid-Open Publication No. 1998-133963 discloses a failure detection method and a recovery method for an information processing apparatus. Still further, a failure recovery method for an information processing apparatus is mentioned in Japanese Patent Application Laid-Open Publication No. 1995-175765.
As shown in the above literatures, a circuit that detects an error gives a notice to all computing units (function boards and processors) in the system in a conventional information processing apparatus (a computer system) when an error occurs in the system, and processing in the system pauses. Then, a representative computing unit (for example, main board or board exclusive to error analysis) among the computing units that have received the notice reads all error display registers in the system, followed by carrying out an error analysis.
However, when all the error display registers are read and the errors are analyzed in this manner, the volume of the registers to be read becomes larger as the scale of the system becomes larger, and as a result, the processing of programs becomes sluggish.