Conventionally, in an exemplary computer system having a fault detecting function, a Fault Location algorithm is implemented on firmware. With such an algorithm, when a fault at one point is spread to cause error reports to be issued from a plurality of node to firmware all at once, the fault causing such a situation as above is specified based on these plural error reports (refer to Japanese Laid-open Patent Publication No. 2001-166965).
In this system, an error at an output portion of one node and an error at an input portion of on an input-destination node of a bus connecting to the output portion may be simultaneously detected and are individually reported to the firmware. In this case, such an algorithm can be thought as that the firmware receiving two error reports checks the contents of these two errors and indicates only the node on an output side as a suspicious component when these two errors match each other, and an input side ignores the other as a spread error.
However, the conventional technology has a problem in which a suspicious component cannot be correctly specified due to a time difference of clear timing of the firmware.
Specifically, error information is cleared by the firmware in the system explained above for each node via a shared bus for system management. Therefore, a time difference in access for clearing necessarily occurs. For this reason, if a fault is such that errors successively occur at short intervals close to the access time, the firmware cannot specify only the node on an output side as a suspicious component (error portion), overly specifying also the node on an input side as a suspicious component.
For example, an example of the case explained above in which the firmware overly specifies suspicious components is explained by using FIG. 9. As depicted in FIG. 9, when a first error occurs on output-side node A and an invalid packet is transferred to an input-side node B, the nodes A and B each output an error interrupt to firmware. The firmware makes subsequent error reports once masked, first logs (records) and then clears error information about the input-side node B, and then logs and then clears error information about the output-side node A. Here, for simplification of firmware processing, the processing order of the nodes are fixed.
Then, it is assumed that a second error of the same type occurs between clear processes of the nodes B and A. At the output-side node A, when trying to clear the first error, the firmware inadvertently clears information about the second error, and therefore the second error information is not left in the log register. However, at the node B, after performing a clearing process, the firmware detects the second error, and therefore the second error information is logged and left. As a result, after canceling error interrupt mask to allow an error interrupt to be accepted, the firmware receives only an error interrupt from the input-side node B. Thus, in error analysis, the firmware erroneously determines that the original error is the error at the input-side node B, resulting in overly specifying the input-side node as a suspicious component.