1. Field of the Invention
The present invention relates to a processing technology for identifying a place which is suspected of being a cause of an anomaly (hereinafter referred to as “suspected place”) in the case where the anomaly has occurred in a computer system and the like. More particularly, the present invention relates to a processing technology for, after a part at the suspected place which has been identified statistically based on certain error information is replaced, if the same error information is notified again, identifying the suspected place or a place which is next estimated as the suspected place (hereinafter referred to as “second suspected place”). The present invention is practiced as, for example, a fault management of RAS control (control of Reliability, Availability and Serviceability) of the computer system.
2. Description of the Related Art
In a computer system, for example, if an anomaly occurs in a bus communication, it may not be possible to surely identify which side connected to a bus has a part having the cause. Thus, a process of identifying a suspected place statistically based on error information is performed. In this identifying process, a weighting is set which is added to each content of the anomaly or to each part, a predetermined weighting is added with respect to a place related to anomaly information in the error information which has been notified, and the place which has exceeded a predetermined threshold is identified as the suspected place. Then, a process of isolating the part at the suspected place is performed.
Moreover, as a fault monitoring/notifying method of Patent Document 1, there is a method of providing a predetermined threshold for each content of a fault in alarm information to be reported, reporting the content of the fault which has occurred more than or equal to a specified number of times of the threshold to an administrator, and performing determination of implementing a preventive maintenance, in a fault management of a network (Patent Document 1: Japanese Patent Laid-Open No. 6-175887).
However, in the process of identifying the suspected place statistically, a different part other than the part in which the anomaly has occurred actually may be identified. By using FIGS. 8A, 8B, and 8C, a problem in the process of identifying the suspected place statistically will be described.
As shown in FIG. 8A, control modules (CM) 907a and 907b are provided respectively on two buses between a part module (part M) 901 and a part module (part M) 903, which configure the computer system, and RAS control and the like are performed. It is assumed that the anomaly of the bus communication between the part module 903 and the control module 907a has been detected at the control module 907a. Here, it is assumed that there is the cause of the anomaly at the side of the control module 907a. 
By a suspected place determination function, the predetermined weighting is added with respect to the part module 903, the control module 907a and the bus 905a respectively, based on the error information which has been notified by a communication driver, and if the added weighting has reached the predetermined threshold, the part is identified as the suspected place. For example, if the weighting of the part module 903 has reached the threshold, the part module 903 is identified as the suspected place.
Then, as shown in FIG. 8B, the part module 903 is isolated, and a part module 910 which is a new maintenance part is incorporated. However, if the part module 903 at the suspected place is isolated and the part module 910 is incorporated, since the part which has become the cause of the anomaly is not removed, the same error information is notified. Then, the similar process of statistically adding a point is performed, and the point is added to the weightings of the control module 907a and the part module 910 which are related to the anomaly, respectively in a similar fashion.
Thus, as shown in FIG. 8C, the same place is identified as the suspected place, and the part module 910 which has been newly incorporated becomes a target of the isolation process. Otherwise, in the process of incorporating the part module 910, the anomaly is detected in an access to the control module 907a, and the incorporation process fails.
In this way, in the case of identifying the suspected place statistically, since the same place is identified as the suspected place because the predetermined weighting is added thereto, the incorporated part module becomes the target of the isolation each time the same anomaly is detected, and a state occurs in which a part replacement is performed many times at the same place.