Heretofore, in a computer system that manages operations based on messages, the computer system outputs a message related to an abnormality when a device, an application, or the like in the system is abnormally operated. A management apparatus that manages operations monitors messages outputted in abnormal operation, and detects an abnormality on the system.
Moreover, the management apparatus monitors messages outputted from the device, the application, or the like in the system in normal operation separately from messages outputted in abnormal operation, and detects an abnormality on the system. In the monitoring and detection, an administrator that manages operations manually defines messages targeted for monitoring as rules on the operation in correspondence to the device configuration of the system, the operation of the system, and the like. Thus, the management apparatus can monitor messages targeted for monitoring defined in the rules, and can detect an abnormality on the system based on the rules on the operation.
Here, there is a technique to monitor messages for detecting an abnormality. In the technique, a management apparatus stores normal patterns. The normal pattern is combinations of one or two or more of continuous messages generated in the case where a distributed system is normally operated, and includes elements that are identifiers to uniquely identify the messages and the occurrence numbers of messages indicated by the identifiers. The management apparatus then makes reference to normal patterns, searches for an identifier matched with the identifier of the collected message, and counts the occurrence number of the message, which is indicated by the identifier, in the case where the corresponding identifier exists. The management apparatus then determines an abnormality in the case where the counted occurrence number of the message is a predefined value or less.
Moreover, there is a technique in which events are monitored to detect the throughput of a CPU (Central Processing Unit). In the technique, in the case where a packet destination is directed to a different device, a relay-equipped device monitors the occurrence interval between events that are generated on a regular basis, and determines the throughput of the CPU depending on whether the occurrence interval between the monitored events exceeds a predetermined interval. The device then detects that the CPU does not have enough remaining power in the case where the occurrence interval between the events exceeds a predetermined interval.
Furthermore, there is a technique in which an FCS (Frame Check Sequence) error frame (in the following, referred to as an error frame) is detected to find a fault on a network system. In the technique, a fault prediction device calculates the number of bits between error frames that is the total number of bits of frames transmitted on a transmission line between an error frame and a subsequent error frame. The fault prediction device then compares the calculated number of bits with a threshold calculated based on the number of bits transmitted in correspondence to the occurrence rate of bit errors on the transmission line statistically spontaneously produced, and detects a fault on the network system. The fault prediction device then determines that a fault occurs on the network system when the calculated number of bits is smaller than the threshold.    Patent Literature 1: Japanese Laid-open Patent Publication No. 2007-96835    Patent Literature 2: Japanese Laid-open Patent Publication No. 11-224214    Patent Literature 3: Japanese Laid-open Patent Publication No. 08-139722    Patent Literature 4: Japanese Laid-open Patent Publication No. 2006-318071