1. Field of the Invention
The present invention is related to node status monitoring in distributed computing systems, and more specifically to a scheme of node status sharing by gossiping among the nodes.
2. Description of Related Art
In large-scale distributed computer systems, such as those using distributed software models to perform tasks, multiple nodes provide independent execution of sub-tasks. In order to keep such a system operational, and further, to provide for proper operation of distributed applications that use the multiple nodes to perform various tasks, the status of nodes is tracked. In particular, in order to assign tasks to nodes, and in order to ensure that a node is available to communicate with to perform a task, the operational status of the nodes and their ability to communicate with the other nodes must be monitored.
Communications and status monitoring is typically centralized, with a monitoring application providing information about node and interface status. The monitoring application may use distributed agents to perform the monitoring on each node. Heartbeat messages are typically sent from the nodes to a centralized manager that maintains a record of the status of each node.