1. Field of the Invention
The present invention relates to coordinating activities between nodes in a distributed computing system. More specifically, the present invention relates to a method and an apparatus for reaching agreement between nodes in the distributed computing system regarding a node to function as a primary provider for a service.
2. Related Art
As computer networks are increasingly used to link computer systems together, distributed computing systems have been developed to control interactions between computer systems. Some distributed computing systems allow client computer systems to access resources on server computer systems. For example, a client computer system may be able to access information contained in a database on a server computer system.
When a server computer system fails, it is desirable for the distributed computing system to automatically recover from this failure. Distributed computer systems possessing an ability to recover from such server failures are referred to as “highly available systems.”
For a highly available system to function properly, the highly available system must be able to detect a server failure and reconfigure itself so that accesses to a failed server are redirected to a backup secondary server.
One problem in designing such a highly available system is that some distributed computing system functions must be centralized in order to operate efficiently. For example, it is desirable to centralize an arbiter that keeps track of where primary and secondary copies of a server are located in a distributed computing system. However, a node that hosts such a centralized arbiter may itself fail. Hence, it is necessary to provide a mechanism to select a new node to host the centralized arbiter.
Moreover, this selection mechanism must operate in a distributed fashion because, for the reasons stated above, no centralized mechanism is certain to continue functioning. Furthermore, it is necessary for the node selection process to operate so that the nodes that remain functioning in the distributed computing system agree on the same node to host the centralized arbiter. For efficiency reasons, it is also desirable for the node selection mechanism not to move the centralized arbiter unless it is necessary to do so.
Hence, what is needed is a method and an apparatus that operates in a distributed manner to select a node to host a primary server for a service.