(1) Field of the Invention
The present invention relates to fault information collection program and apparatus for monitoring network environments, and more particularly, to fault information collection program and apparatus capable of receiving network state-indicative messages in a redundant configuration.
(2) Description of the Related Art
With the recent progress of computerization of business transactions, many corporations have their own in-house computer networks. It is, therefore, generally the case that documents created for transactions, for example, are distributed through the network to the persons concerned, with the use of functions such as electronic mail and groupware.
Since daily transactions are thus deeply involved with the network, damage to the transactions is greater than ever if fault occurs in the network. Accordingly, there has been a demand that the network state be monitored and that when a fault has occurred or an event that can possibly cause a fault is detected, measures be promptly taken.
In the case of monitoring a large-scale network constituted by a plurality of small-scale networks, numerous engineers who take charge of network management are needed if the engineers are assigned to the respective small-scale networks for monitoring, which leads to inefficiency. A method has therefore been adopted wherein a specified server collects information about faults occurring on the networks and analyzes the faults and the like. This permits efficient monitoring of networks.
When a fault has occurred on the networks, no measures can be taken unless information indicating the fault reaches the server. Accordingly, where remote networks are monitored by the server, it is necessary that information about faults of the individual networks should be sent to the server without fail.
Generally, therefore, there is provided redundant paths to transfer information from a target network to be monitored to the server. That is, a plurality of communication paths are provided for sending fault information about a fault of the target network to the management server. Fault information is usually transferred to the management server through one path, and if a fault occurs in this path, the information is transferred through another path. In this manner, fault information reaches the management server without fail.
In the system in which paths are switched in case of fault occurrence, however, a problem arises in that no fault information reaches the server during the period after a fault occurs in the original path until the paths are switched following confirmation of the fault.
To eliminate the drawback, a method may be adopted in which the same fault information is always sent to the server in parallel through a different path. With such method, even if a fault occurs in one communication path, the fault information passing through the other communication path can reach the server.
However, with this method, the management server is unable to determine whether an identical fault has occurred a plurality of times or a single fault has been notified in parallel. For certain types of fault on the network, different measures need to be taken depending on the frequency of occurrences of fault, and therefore, such parallel notification can confuse the system administrator.
As a technique of preventing parallel notification, a mutual supervisory control system is disclosed in Unexamined Japanese Patent Publication No. 1-123355, for example. According to this invention, the transmitting side transmits an identical telegram in parallel to a plurality of physical communication paths, and the receiving side accepts the telegram arriving first and discards the later arriving telegrams. Thus, even in the event a fault occurs in one physical communication path, the telegram passing through another physical communication path can reach the receiving side. Consequently, the telegram notifying of a fault can be transferred to another computer without fail.
In the invention disclosed in the above publication, the multiple telegrams output in parallel from a host computer are affixed with identical identification information (time identifier, serial number). A host computer of the receiving side compares the arrival times of telegrams including the same identification information, to discard the later arriving telegrams among the telegrams output in parallel.
However, in the case where telegrams are originated from a single host computer as in the invention disclosed in the above publication, the transmission of telegrams is disrupted if a fault occurs in the host computer, even though a plurality of physical communication paths are provided. The state of one network may therefore be monitored by a plurality of computers (supervisory servers) (redundancy is configured inclusive of supervisory servers). In this case, however, it is difficult to affix identical identification information to telegrams which are output in parallel from different supervisory servers.
For example, in the case of monitoring a network state by polling, supervisory servers differ from each other in polling timing, so that a fault caused by an identical event is detected at different times. As a result, the receiving side is unable to determine whether or not a received message is based on the same fault (whether the received message is a redundant message) by merely checking the fault detection time.