1. Field of the Invention
The present invention relates to a monitoring simulating device, method, and program for performing simulation to acquire the relationship between a failure point and a monitoring error message by using the configuration information of a computer system and creating a complete list of failure points and error messages.
2. Description of the Related Art
It is required in the operation of a computer system, if a failure occurs, to detect a failure point as soon as possible and restore the failure point to normal. To this end, the process is used to incorporate a monitoring device into a network system and monitor the operating state of the system using a monitoring tool.
There are available various monitoring methods which include: a method in which a monitoring device detects a failure by confirming whether it is possible to communicate from a monitoring point with a piece of equipment to be monitored using the PING command and a protocol such as HTTP; a method in which a monitoring device periodically receives a KeepAlive message from a piece of equipment to be monitored then determines that a failure has occurred when the monitoring device does not receive a KeepAlive message even after a lapse of a certain period; and a method in which each piece of equipment checks itself and, when any abnormality is found, notifies a monitoring device of the abnormality using message communication. Since notification and monitoring of error messages are generally performed by IP communication in a network system, failure points and error messages do not always correspond one-to-one to each other.
More specifically, in a computer system composed of a plurality of pieces of equipment, if a failure occurs in any of the pieces of equipment, error messages may be outputted in chain reaction not only from the failed piece of equipment but also from others which communicate with the failed piece of equipment. For this reason, it is very difficult for an operations manager of a computer system to identify a failed piece of equipment only by looking at an error message.
To cope with this, there exists a technique for creating a table of correspondence between a combination of a plurality of errors to occur and a piece of equipment which may be failed, inferring a piece of equipment which may be failed using the table, and outputting the piece of equipment. As examples of such techniques, an image processing system described in Patent Document 1 is known.
In the image processing system described in Patent Document 1, to allow quick identification of the cause of a true one of a plurality of associated errors even when the errors occur, an information table is created in advance in which an error message based on a true error is stored for each of error patterns indicating a combination of a plurality of error states. A generated error pattern is retrieved from the information table, and thereby an error message corresponding to the error pattern is outputted.
(Patent Document 1: Japanese Patent Laid-Open 2002-139807)
However, in conventional techniques, to output information which identifies the cause of a true one of a plurality of associated errors when the errors occur, a designer of a computer system is required to create and prepare the information table as described in Patent Document 1.
In a system in which the number of pieces of equipment which may cause failure and of types of error messages are relatively small, it is not very difficult work for a person to create the above-described information table in advance. However, especially in a networked computer system, the number of pieces of equipment to be monitored by a monitoring device of the system is usually excessively large, and the number of error messages is large as well. That is very complicated work for a designer to associate a failure in a piece of equipment with a combination of error messages to be outputted while referring to a network chart. Also, creating the information table manually may cause problems such as an omission in combinations of error messages and an increase in the possibility of errors in setting information identifying a failure point.
In the technique in Patent Document 1 described above, an operator needs to register an appropriate error message in the information table whenever a new error pattern appears. Therefore, conventional techniques are capable of quickly identifying the cause of a true error for a known error pattern but have difficulty in identifying the cause of a true error for an error pattern unregistered in an information table.