Cluster systems having a plurality of nodes within the cluster, which are formed from individual computers, are often used for software which is intended to have a high availability. For this purpose, the cluster system has monitoring and control software, which is also referred to as a reliant management service RMS, and which monitors high-availability software running on the cluster. The high-availability software itself runs at a node in a cluster or is distributed between different nodes. In general, the monitoring software RMS may also be distributed among different nodes, i.e., it may be decentralized.
If error-free running of the high-availability software or of a part of it is no longer ensured at one node in the cluster, then the monitoring software RMS ends the application or the appropriate part of it and restarts it at a different node. The monitoring of the high-availability application or of a part of the high-availability application is performed by so-called monitoring detectors controlled by the RMS. These detectors each monitor a specific part of the application, which is referred to as a resource, and signal (indicate) the status of the resource back to the monitoring software RMS.
One example of this can be seen in FIG. 6, which shows a node C, which is part of a cluster system. The node C contains the reliant management system RMS as monitoring software. Furthermore, the high-availability application APL is run at the node C, and itself interchanges data with a memory management system M1 via the link N1. The monitoring software RMS starts the individual monitoring detectors D1, D2 and D3 in order to monitor the application APL. Each of these detectors is specifically designed for monitoring one specific resource of the high-availability software APL. For example, the detector D3 monitors the communication link N1 between the application APL and the file management system M1. Another detector D2 checks the high-availability application APL on the basis of continuously checking whether it is still running, and sends back reports. The third detector D1 checks, for example, available temporary memory which is required for the high-availability application APL.
The monitor RMS uses the continuous status messages from the individual monitoring detectors to take suitable measures in the event of failure of individual resources that are being monitored by the monitoring detectors, or in the event of other problems occurring. For example, it can end the high-availability software, and can start again at a second node, which is not illustrated.
The individual monitoring detectors are started independently of one another by the monitoring software RMS. However, this leads to a high system load on the node, since the individual detectors consume memory space and computation capacity in a corresponding manner. In the worst case, a poor configuration or a very large number of monitored resources within one node can result in the monitoring detectors consuming the majority of the available computation capacity. Too little capacity is then available for the actual application. Furthermore, the monitoring software receives status messages from monitoring detectors whose actual running and monitoring of the resource are not required at the present time. Processing of all the status messages that are fed back likewise increases the computation time and unnecessarily loads the monitoring software.