1. Field of the Invention
The present invention is related to node status monitoring in distributed computing systems, and more specifically to a scheme of dynamically controlling heartbeat rate and node status thresholds.
2. Description of Related Art
In large-scale distributed computer systems, such as those using distributed software models to perform tasks, multiple nodes provide independent execution of sub-tasks. In order to keep such a system operational, and further, to provide for proper operation of distributed applications that use the multiple nodes to perform various tasks, the status of nodes is tracked. In particular, in order to assign tasks to nodes, and in order to ensure that a node is available to communicate with to perform a task, the operational status of the nodes and their ability to communicate with the other nodes must be monitored.
Communications and status monitoring may be performed according to a heartbeat-driven messaging scheme. Heartbeat messages are typically sent from the nodes to a centralized manager that maintains a record of the status of each node. The heartbeat rate and parameters for determining whether nodes and their connections are operational is typically fixed, so that uniformity in determining node status can be presumed.