1. Field of the Invention
The present invention generally relates to clustered computer systems, and in particular, to the starting of node operations.
2. Description of the Related Art
“Clustering” generally refers to a computer system organization where multiple computers or nodes are networked together to cooperatively perform computer tasks. An important aspect of a computer cluster is that all of the nodes in the cluster present a single system image—that is, from the perspective of a user, the nodes in a cluster appear collectively as a single computer, or entity.
Clustering is often used in relatively large multi-user computer systems where high performance and reliability are of concern. For example, clustering may be used to provide redundancy, or fault tolerance, so that, should any node in a cluster fail, the operations previously performed by that node will be handled by other nodes in the cluster. Clustering is also used to increase overall performance, since multiple nodes can often handle a larger number of tasks in parallel than a single computer otherwise could. Often, load balancing can also be used to ensure that tasks are distributed fairly among nodes to prevent individual nodes from becoming overloaded and therefore maximize overall system performance. One specific application of clustering, for example, is in providing multi-user access to a shared resource such as a database or a storage device, since multiple nodes can handle a comparatively large number of user access requests, and since the shared resource is typically still available to users even upon the failure of any given node in the cluster.
Clusters typically handle computer tasks through the performance of “jobs” or “processes” within individual nodes. In some instances, jobs being performed by different nodes cooperate with one another to handle a computer task. Such cooperative jobs are typically capable of communicating with one another, and are typically managed in a cluster using a logical entity known as a “group.” A group is typically assigned some form of identifier, and each job in the group is tagged with that identifier to indicate its membership in the group.
Member jobs in a group typically communicate with one another using an ordered message-based scheme, where the specific ordering of messages sent between group members is maintained so that every member sees messages sent by other members in the same order as every other member, thus ensuring synchronization between nodes. Requests for operations to be performed by the members of a group are often referred to as “protocols,” and it is typically through the use of one or more protocols that tasks are cooperatively performed by the members of a group. One example of a protocol utilized by many clusters is a membership change protocol, which permits member jobs to be added to or removed from a group.
Protocols are also used at the node level. For example, a node start protocol enables inactive or offline nodes to join a cluster. The manner in which nodes are added to a cluster depends upon whether the cluster is centralized or decentralized. In a centralized clustered computer system, a centralized or shared registry exists for storing its cluster membership information. Accordingly, starting a node in that system requires accessing the centralized or shared registry and updating the registry with the node information.
On the other hand, in a decentralized clustered system, no centralized or shared registry for storing its cluster membership exists. Instead, the cluster membership is stored in each node residing in the system. Thus, starting an inactive or offline node in this system requires the sponsorship of another node that is already a member of the cluster. A disadvantage of this system is that the sponsor node has no knowledge of when the particular node is ready to be sponsored. The sponsor node typically becomes aware of this information through manual intervention of a system administrator, which necessarily is prone to human error. A problem further arises when no member node exists in the cluster to serve as a sponsor. That is, each node would form its own one-node cluster, and thereby forming multiple disjointed one-node clusters.
Therefore, a significant need exists in the art for an improved method of starting a node in a decentralized cluster system without the problems and disadvantages of the current art.