A system such as a data center includes a large number of computers and performs data communication in response to an access from the outside. A system configuration of such a system includes a plurality of identical or similar partial configurations. The system configuration is frequently changed. For example, the configuration is changed due to replacement of the configuration equipment, revision of an application program, and the like. Under a cloud environment, addition of a new server, deletion of a server, and the like are also performed.
There is a method for detecting a failure that occurs in the configuration equipment by using a failure symptom pattern that indicates a symptom of the failure. Based on a log history and information of occurred failures, a combination of messages that has a high probability of co-occurrence with a failure is extracted as a failure symptom pattern. Then, it is determined that a failure is likely to occur when messages of the same combination as the failure symptom pattern are output from the configuration equipment.
There is a method for evaluating a failure that occurs in the configuration equipment by generating a model including pairs of a combination of events of a failure and a cause candidate of the failure, on the basis of information of analysis rules and information of devices to be managed. A corresponding model is obtained on the basis of a received combination of events of the failure.
Related techniques are disclosed in, for example, Japanese Laid-open Patent Publication No. 2006-146668, and Japanese Laid-open Patent Publication No. 2011-76293.
However, in the above-described technology, the failure symptom pattern is generated on the basis of the co-occurrence probability, so that there is a problem that a degree of an influence of change in the configuration equipment on the pattern is not clear, and it is difficult to determine whether or not application of the failure symptom pattern is appropriate. For example, when the failure symptom pattern is learned each time a change in the configuration equipment occurs, the leaning time becomes short, so that reliability of the generated failure symptom pattern is reduced.
There is another method for detecting a failure that occurs in the configuration equipment by using a plurality of detection engines for detecting failure symptoms. In the method, a message output from the configuration equipment is input to the respective detection engines. The results output from the respective detection engines are input to a majority circuit, and the output from the majority circuit is set as a final output.
In the above-described technology using the plurality of detection engines and the majority circuit, from among the results output from the detection engines, a result having the largest number is set as a final output. Thus, a characteristic of the detection engine with respect to an input and a condition associated with the input are not considered. For example, when a certain input is received by the plurality of detection engines, it is probable that not an output from a small number of detection engines that are good at analysis for the input, but an output from wrong detection engines that are more than the small number of detection engines is set as a final output.
In addition, in terms of a characteristic of a failure, it is desirable that a failure does not occur in commercial configuration equipment. Thus, there is a tendency that failure cases are reduced, and there is a failure symptom that is difficult to catch by an output result of the individual detection engine.