“Clustering” generally refers to a computer system organization where multiple computers, or nodes, are networked together to cooperatively perform computer tasks. An important aspect of a computer cluster is that all of the nodes in the cluster present a single system image—that is, from the perspective of a user, the nodes in a cluster appear collectively as a single computer, or entity.
Clustering is often used in relatively large multi-user computer systems where high performance and reliability are of concern. For example, clustering may be used to provide redundancy, or fault tolerance, so that, should any node in a cluster fail, the operations previously performed by that node will be handled by other nodes in the cluster. Clustering is also used to increase overall performance, since multiple nodes can often handle a larger number of tasks in parallel than a single computer otherwise could. Often, load balancing can also be used to ensure that tasks are distributed fairly among nodes to prevent individual nodes from becoming overloaded and therefore maximize overall system performance. One specific application of clustering, for example, is in providing multi-user access to a shared resource such as a database or a storage device, since multiple nodes can handle a comparatively large number of user access requests, and since the shared resource is typically still available to users even upon the failure of any given node in the cluster.
Clusters typically handle computer tasks through the performance of “jobs” or “processes” within individual nodes. In some instances, jobs being performed by different nodes cooperate with one another to handle a computer task. Such cooperative jobs are typically capable of communicating with one another, and are typically managed in a cluster using a logical entity known as a “group.” A group is typically assigned some form of identifier, and each job in the group is tagged with that identifier to indicate its membership in the group.
A primary-backup group is a group in which one group member is designated as the primary, and the others members are backups. Primary-backup groups are often used when the primary member has connectivity with a resource, or “owns” a resource, such as a disk, tape or other storage unit, a printer or other imaging device, or another type of switchable hardware component or system. In a primary-backup group, only one primary member is defined, and there can never be two primary members at the same time.
Member jobs in a group typically communicate with one another using an ordered message-based scheme, where the specific ordering of messages sent between group members is maintained so that every member sees messages sent by other members in the same order as every other member, thus ensuring synchronization between nodes. Requests for operations to be performed by the members of a group are often referred to as “protocols,” and it is typically through the use of one or more protocols that tasks are cooperatively performed by the members of a group.
Clusters often support changes in group membership through the use of group organizational operations such as membership change protocols, e.g., if a member job needs to be added to or removed from a group. In some clustered systems, a membership change protocol is implemented as a type of peer protocol, where all members receive a message and each member is required to locally determine how to process the protocol and return an acknowledgment indicating whether the message was successfully processed by that member. Typically, with a peer protocol, members are prohibited from proceeding on with other work until acknowledgments from all members have been received. In other systems, membership change protocols may be handled as master-slave protocols, where one of the members is elected as a leader, and controls the other members so as to ensure proper handling of the protocol.
In many clustering environments, members may from time to time leave a group, e.g., due to a failure, node maintenance, etc. Later, it may be desirable for these members to rejoin the group. Such a member is referred to as a “rejoining member.” In this situation, information about the group, as well as the rejoining member's perception of the group, has a direct bearing on the terms under which a member rejoins the group. This information is referred to as “group state data.”Group state data is typically distributed, or replicated, among all group members.
In some environments, all of the group state data is replicated on each member of a group. In other environments, some of the group state data may be stored globally, e.g., in a global file system accessible to all members. However, even in the latter environments, some portion of the relevant group state data is typically replicated on each member of a group.
In the case of a primary-backup group, the distributed group state data held by members may include information that indicates which member is the primary member, the order of backup, e.g., first backup, second backup, etc., and the resources that a primary member needs in order to be active, e.g., necessary files, IP addresses, disks units, etc.
A problem that exists when a member leaves and then rejoins a group, is that of synchronizing the group state data between the joining member and the other, existing members, as the group state data held by a member that left the group may not be the same as the group state data held by other members of the group when the member rejoins. For example, the replicated group state data may have changed while the rejoining member was not a member of the group, or conversely, the replicated group state data held by existing members of the group may be outdated and the rejoining member may have the most current group state data. An example of the latter instance is when the primary member is rejoining, as only the primary member may have data regarding the current condition of group related resources.
One conventional method by which this problem is addressed in existing clustering implementations is to view the replicated group state held by existing members, or the “existing group state data,” as a protocol. When this is done, a rejoining member typically sends its view of the replicated group state data to all existing members. The group then attempts to reach a consensus as to which group state data, or perhaps, which parts of various group state data, will then be replicated among all members.
One trouble with the aforementioned conventional method is reconciling inconsistencies in various group state data when a member rejoins. Oftentimes, the existing group state data or the rejoining member's group state data is used as a first guess at the appropriate group state. As a second guess, some combination of the existing group state data and the rejoining member's group state data may be used as the appropriate group state. Beyond this, and as is often the case, manual intervention is required.
Manual invention requires an administrator to reform the group with new group state data using commands to adjust the data of existing members and the rejoining member. Typically, the administrator selects a group member and uses the selected member as a “master” that all other members are synchronized to. Manual invention suffers from, among other things, the introduction of errors and delays.
Therefore, a significant need exists in the art for an improved manner of synchronizing group state data in connection with rejoining a member to a primary-backup group in a clustered computer system.