A server cluster is a group of at least two independent servers connected by a network and managed as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be quickly restarted on a surviving server, with no substantial reduction in service. Indeed, clients of a Windows NT cluster believe they are connecting with a physical system, but are actually connecting to a service which may be provided by one of several systems. To this end, clients create a TCP/IP session with a service in the cluster using a known IP address. This address appears to the cluster software as a resource in the same group (i.e., a collection of resources managed as a single unit) as the application providing the service. In the event of a failure the cluster service "moves" the entire group to another system.
Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Dynamic load balancing is also available. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
While clustering is thus desirable for many applications, problems arise when the systems in a cluster stop communicating with one another, known as a partition. This typically occurs, for example, when there is a break in the communications link between systems or when one of the systems crashes. When partitioned, the systems may separate into two or more distinct member sets, with systems in each member set communicating among themselves, but with no members of either set communicating with members of any other sets. Thus, a first problem is determining how to handle the split. One proposed solution is to allow each member set to continue as its own, independent cluster. However, one main difficulty with this approach is that the configuration data (i.e., state of the cluster) that is shared by all cluster members and which is critical to cluster operation may become different in each of the multiple clusters. To subsequently reunite the sets into a common cluster presumes that reconciliation of the data may later take place, however such reconciliation has been found to be an extremely complex and undesirable undertaking.
A simpler solution is to allow only one set to survive and continue as the cluster, however this requires that some determination be made as to which set to select. The known way to make this determination is based on determining which set, if any, has a simple majority of the total systems possible therein, since there can be only one such system.
However, if a cluster shuts down and a new cluster is later formed with no members common to the previous cluster, known as a temporal partition, a problem exists because no new member possesses the state information of the previous cluster. Thus, in addition to deciding representation by which cluster has the most systems, the majority solution further requires that more than half of the total possible systems in a cluster (i.e., a quorum) are communicating within a single member set. This ensures that at least one system is common to any permutation of systems that forms a cluster, thereby guaranteeing that the state of the cluster is persisted across the temporal partition as new clusters having different permutations of systems form from time to time.
A problem with the simple majority/quorum solution is that there is no surviving cluster unless more than half of the systems are operational in a single member set. As a result, a minority member set that otherwise would be capable of operating as a cluster to adequately service clients is not allowed to do so. A related problem arises when forming a cluster for the first time after a total system outage. Upon restart, no one system can form a cluster and allow other systems to join it over time because by itself, that system cannot constitute a quorum. Consequently, intervention by an administrator or a special programmatic process is required to restart the cluster.