1. Field of the Invention
The present invention relates to a message abnormality automatic detection device, method and program for detecting abnormalities in messages generated in a distributed system composed of a plurality of information processing devices and the like.
2. Description of the Related Art
Recent information processing systems often implement a form of distributed system wherein a plurality of information processing devices, software, and the like operate in concert with each other to actualize predetermined features.
In a distributed system, because large amounts of various messages are output from information processing devices comprising the distributed system or respective hardware and software comprising the information processing devices, a feature is provided for collecting and displaying these messages within one console.
However, even if the messages output from the distributed system are displayed within a console because of the large amount, this is problematic in that it is difficult to know which of these messages are truly important.
In Japanese Patent Laid-open Publication 2001-292143, a monitoring unit has a pattern file to which the characteristics of failure messages are entered beforehand, and a failure detection system, wherein whether or not a message is a failure message is determined by comparison with operation state message and individual patterns within the pattern file, is disclosed.
However, although current operation management tools have a feature for filtering non-critical messages, the definition of which message to output and which message to not output (filtering definition) must be performed manually for each message. As long as there are large numbers of message types, determination of their importance is difficult and actual definition is difficult.
In addition, although, ordinarily, an importance level code indicating the level of importance (“information level”, “warning level”, and “critical level” etc., if there are three levels) is attached to the messages, there are cases wherein the degree of importance differs with the system environment (system topology/operating conditions, etc.) even if the message is the same.
For example, an “information level” message stating “HTTP services have been terminated” is not a problem during an intentional termination when business is closed for the night. However, if this is output during normal operations, this is a failure of some sort, such as an operation error, and is a critical message which requires urgent response.
Furthermore, there are instances wherein the true degree of importance cannot be known by only one message and must be determined by the patterns of plural messages.
For example, with regards to the following three messages,
(A) “abnormal return of request to send”
(B) “successfully retransmitted”
(C) “network communication delay”,
if messages (A)-(B) are output sequentially, there is no particular need for a response. However, if messages (A)-(C) are output sequentially and message (B) is not output, it is assumed that some sort of abnormality exists, and there is need for examination. In addition, even with the messages (A)-(B) pattern, if they are output in large amounts over a short period of time, it is assumed that some sort of abnormality exists, and there is need for examination.