In a distributed computing environment, the computer contains a plurality of computer nodes which process data in parallel. In many such systems, the nodes may be organized into partitions, with each partition having one or more domains. Each domain may be running a different computer program. In many of these systems, a single node controls each domain. As each node becomes available by coming on line or coming up, the node must be assigned to a domain, and the other nodes in the domain must become aware of the new node such that tasks of the program running in the partition may be shared. In some systems this is done by each node sending a message to every other node in the system to determine such information as which node is controlling the domain and which tasks are assigned to which nodes.
In high availability applications wherein there is a recovery process to recover a failed node, it is necessary for the failed node which is recovering to find out the same information as a new node coming on line for the first time. There is a further requirement in that if the failed node is the node that controls the domain, it is necessary to appoint a new node to take over control of the domain within a partition. Many times this is done by sending a message from each node to every other node in the domain to find out which nodes are still available and to attempt to assign a new control node. The result of such schemes is to send out a heavy volume of message traffic each time a node comes up or is recovered.