Recently, a large number of servers are configured in information and communication technology systems (referred to simply as “system” below), and many types of processes are executed while the servers are cooperating with each other. When a failure is generated in a system, the cause is isolated and a primary treatment (such as disconnection and the like) is carried out by an operator. For example, when a failure is generated in a system, the operator first identifies the server that represents the generation location of the abnormality. The operator then first analyzes the operating conditions of the identified server and investigates the cause of the abnormality.
When processing a plurality of cooperating servers, the server in which the abnormality was detected and the server that has the cause of the abnormality may be different. If it is possible to judge that the server in which the abnormality was detected is not the cause, the operator investigates the servers having relevancy with the server in which the abnormality was detected to identify the cause of the abnormality. Because one server has relevancy with a large-number of servers in a large-scale system, the number of servers that have relevancy with the server in which the abnormality was detected is huge and the investigation of the cause of the abnormality desires much time.
Moreover, due to the spread of outsourcing of system operation management, the number of cases in which the system is executed without knowing the strict behavior of the configuration items of the system is increasing. As a result, the black-box effect has become more and more apparent as a harmful effect in operation management. That is, there is very little information on processing executed by apparatuses in the system and a longer amount of time is desired to isolate the cause when a failure is generated.
Accordingly, various techniques related to increasing the speed of failure measures and suppressing the generation of failures have been considered. For example, a failure location estimation system has been considered in which the range of the cause of an abnormality in a network is narrowed down to assist problem investigation in the system. Moreover, an incident management system has been considered in which, when a plurality of failure messages are output due to one system failure, the related failure messages can be collected and handled. Further, an information processor apparatus has been considered that is capable of improving the filtering accuracy of failure messages. A detection device has even been considered in which useful information is detected when suppressing the occurrences of failures.
Japanese Laid-Open Patent Publication No. 2011-138405, Japanese Laid-Open Patent Publication No. 2012-94049, Japanese Laid-Open Patent Publication No. 2014-106851, and Japanese Laid-Open Patent Publication No. 2014-199579 are known as examples of the prior art.