In line with the rapid growth of Internet businesses recently, decreases in corporate credibility and lost business opportunities brought on by service stoppages due to system failures have become major problems. For this reason, rapid recovery from a failure is desirable.
As a system for supporting the identification of a recovery method, for example, there is the failure record database system disclosed in Patent Literature 1. A system administrator registers a failure that has occurred in a monitoring target node and the method actually used to recover from this failure in this database system as a failure record. The database system maintains a plurality of failure records. In a case where a new failure occurs, the administrator of the monitoring target node (may be called the “system administrator” below) inputs the desired keyword. The database system retrieves the failure record that conforms to the inputted keyword from the plurality of failure records.
Meanwhile, there is a monitoring system for monitoring the operational status of the monitoring target node. The monitoring system receives a change in the operational status of the monitoring target node (for example, an input/output (I/O) error with respect to a disk device and a drop in processor throughput) as an event from this monitoring target node. The system administrator becomes aware of the nature of this event by receiving this event via a message or warning lamp. The administrator learns about the failure (for example, a service stoppage or drop in performance) in this monitoring target node from the nature of this event, and predicts the root cause of this failure.
Further, Root Cause Analysis (called RCA below) is a technique for predicting the root cause of a failure. The monitoring system maintains combinations of event groups and root causes as rules, and when an event has been received, infers the root cause of this event from the rule including this event.
According to Patent Literature 2, inconsistency is calculated for cases in which the event that occurred is a known event and cases in which it is an unknown event, and the calculated inconsistency is taken into account in inferring the root cause of the failure.
According to Patent Literature 3, information denoting an environmental relationship between monitoring target nodes is constructed. When inferring the root cause of a failure, the monitoring target node that will be affected by a failure that has occurred in a certain monitoring target node is identified on the basis of this information.