Typically, a distributed computing environment or computing system includes a number of processors or nodes interconnected to one another via one or more links to form a system of networks. This network of nodes is then able to process tasks concurrently in a manner which is more effective than with nodes processing individually.
In order to allow the nodes of these computing systems to process tasks in such a manner, monitoring systems are implemented within the computing systems to monitor the status of the nodes and their network adapters. These monitoring systems typically monitor the computing system for the failure or activation of nodes within the system. Thus, if a node or network adapter in such a computing system were to fail, the monitoring system would be responsible for identifying the failed node and for informing the remainder of the nodes of the failure.
From time to time, configuration changes are required to be implemented in computing systems operating under these monitoring systems due to, for example, the addition or deletion of computing resources (either individual nodes or entire networks of nodes) or due to, for example, address changes of network adapters. One possible procedure for implementing configuration changes requires the monitoring system to be deactivated and restarted only after the new configuration has been implemented. However, the deactivation of the monitoring system greatly inconveniences the subsystems relying on the monitoring system.
An alternate procedure is to implement configuration changes by performing a global synchronization. However, with a global synchronization each node in the computing system is required to be directly connected to each of the other nodes. Additionally, if the nodes in the computing system belong to different networks, a multiple hop communication is required for messages between some of the nodes. Furthermore, global synchronization also detrimentally interrupts any protocols running when reconfiguration is initiated.
As yet another alternative, nodes operating under the monitoring system may be reconfigured individually without deactivating the entire system. However, this procedure results in the danger of the transmission of messages from a node operating one configuration to nodes operating under another configuration, and because the contents of some messages are valid only when exchanged between nodes having the same view of the system, this procedure oftentimes leads to disastrous results.
In high availability systems, the above-mentioned disadvantages are unacceptable. Thus, a need exists for a reconfiguration protocol which allows reconfiguration without interruption to executing protocols. In addition, a further need exists for a reconfiguration protocol which implements a new configuration without requiring global synchronization.