1. Field of the Invention
The present invention relates to a computer monitoring system, a computer monitoring method and a computer monitoring program, and more particularly to a computer monitoring system, a computer monitoring method and a computer monitoring program permitting mutual monitoring of computers and dynamic reconfiguration of computers even in a large-scale system in which hundreds to thousands of computers are involved.
2. Description of the Related Art
One example of computer monitoring system according to the prior art is described in the Japanese Patent Application Laid-open No. Sho 63-4366. This computer monitoring system according to the prior art is configured of a means which consists of a plurality of computers and a communication path for connecting these computers and transmits, when given the right to transmit health notices, health notices to all the other computers, a means for transmitting a response notice to a health notice from another computer; a means for receiving the response notice; and a means for determining any trouble according to the contents of the response notice.
The prior art computer monitoring system having such a configuration operates in the following manner.
Thus, the right to transmit health notices is transferred among a plurality of computers in a prescribed order of precedence. A computer having the right to transmit transmits health notices to other computers and receives response notices thereto. If the right to transmit is not transferred in the prescribed order of precedence or no response notice is received, the occurrence of trouble can be detected.
This example of the prior art, however, involves the following problems.
A first problem is that it has no possibility for extension when the number of computers is increased.
As a computer having the right to transmit health notices performs communication with other computers on a one-to-one basis, there is a limit to the number of times health notices can be transmitted and response notices received during a certain length of time between the transmitting intervals of health notices. If the number of computers reaches hundreds or even thousands, no mutual monitoring is possible.
A second problem is that dynamic changes in the configuration of mutually monitoring computers cannot be coped with.
In a system consisting of a plurality of computers, the number of computers may be increased or decreased according to the status of the load of processing, some computers may be stopped for regular maintenance, or computers under no load may be automatically suspended from operation to save power consumption. Since mutual monitoring of computers is performed on the basis of predetermined sets of information such as a list of the computers to be mutually monitored and the order of precedence in transferring the right to transmit health notices, no such dynamic change in configuration during operation can be adequately coped with.