The application relates to the monitoring and/or checking, generally referred to as monitoring for short in the following, of a system condition of a system with distributed components.
In such systems—known as distributed systems for short—such as for example mobile or stationary radio and/or communication networks, there is as a rule a requirement for all the components of the distributed system to have a knowledge of the status or condition of every other component in the system (monitoring).
If for example a component in a distributed system fails, for example as a result of the fact that a component is or becomes inoperative and/or offline, then it is advantageous for every other component in the system to receive this information.
Different approaches in which the monitoring is implemented by a so-called Ping-Pong mechanism using so-called Ping-Pong messages are known from the related art for monitoring a system condition of a distributed system.
In this situation, in other words in the case of such a mechanism based on Ping-Pong messages, a system component periodically sends a Ping message over the distributed system to a component being monitored, to which the component being monitored responds with a Pong response, a so-called Pong Acknowledgment.
If there is no Pong response from a component being monitored, then this component will be classified, by the component sending the Ping message as a rule, as offline or generally as inoperative.
In order to enable a component of a distributed system to ascertain or check the status or condition of every other component of the distributed system, it queries all the components using the Ping-Pong mechanism.
A known (first) approach here for monitoring a system condition of a distributed system, based on the Ping-Pong mechanism, makes provision whereby each component of a distributed system monitors every other component of this system and for this purpose queries corresponding information concerning the status or the condition of the other components in each case.
To this end, each component of the distributed system sends every other component of the system a Ping message—and receives (back) corresponding Pong responses when the other components in question are in an operative or online condition.
FIG. 3 illustrates this known first approach. A distributed communication system 300, a HiPath IP telephony system 300, with a plurality of communication servers 301 to 306 associated in a communication network is thus represented in FIG. 3. Each of these communication servers 301 to 306 needs to know about the failure of any other communication server 301 to 306 in the system 300. To this end, each of these communication servers 301 to 306 now sends a Ping message 310 to each of the other communication servers 301 to 306 in the HiPath IP telephony system 300—and receives corresponding Pong responses 311 when the other communication servers 301 to 306 in question are in an operative or online condition.
The disadvantage of this known first approach is the fact that a high message volume is generated here of an order O(n^2) (n: number of system components) during monitoring of a system condition of a distributed system, which may restrict the power or capacity and/or the error detection capability/speed of the system.
Thus, for example, in the case of the HiPath IP telephony system 300 according to FIG. 3, Ping-Pong messages (310, 311) are sent every 60 seconds. In this situation, in the case represented with the 6 communication servers 301 to 306, or alternatively 30 servers, (6*5*2)/60 s=1 message per second or (30*29*2)/60 s=29 messages per second respectively are produced.
With regard to a further known approach for monitoring a system condition of a distributed system, based on the Ping-Pong mechanism, provision is made here whereby a central coordinator checks components of a distributed system, registers inoperative components or failed components of the distributed system and propagates corresponding information to all the components in the system.
The message volume generated here is of the order O(n).
The disadvantage of this further known approach is the fact that it can only be implemented robustly with difficulty because the central coordinator of the distributed system must be maintained in redundant form.