A server cluster ordinarily is a group of at least two independent servers connected by a network and utilized as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
Other benefits of clusters include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline for the duration of the maintenance activity. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
When operating a server cluster, the cluster operational data (i.e., state) of any prior incarnation of a cluster needs to be known to the subsequent incarnation of a cluster, otherwise critical data may be lost. For example, if a bank's financial transaction data are recorded in one cluster, but a new cluster starts up without the previous cluster's operational data, the financial transactions may be lost. To avoid this, prior clustering technology required that each node (server) of a cluster possess its own replica of the cluster operational data on a private storage thereof, and that a majority of possible nodes (along with their private storage device) of a cluster be operational in order to start and maintain a cluster.
However, requiring a quorum of nodes has the drawback that a majority of the possible nodes of a cluster have to be operational in order to have a cluster. A recent improvement described in U.S. patent application Ser. No. 08/963,050, entitled “Method and System for Quorum Resource Arbitration in a Server Cluster,” assigned to the same assignee of the present invention, provides the cluster operational data on a single quorum device, typically a storage device, for which cluster nodes arbitrate for exclusive ownership. Because the correct cluster operational data is on the quorum device, a cluster may be formed as long as a node of that cluster has ownership of the quorum device. Also, this ensures that only one unique incarnation of a cluster can exist at any given time, since only one node can exclusively own the quorum device. The single quorum device solution increases cluster availability, since at a minimum, only one node and the quorum device are needed to have an operational cluster. While this is a significant improvement over requiring a majority of nodes to have a cluster, a single quorum device is inherently not reliable, and thus to increase cluster availability, expensive hardware-based solutions are presently employed to provide highly-reliable single quorum device for storage of the operational data. The cost of the highly-reliable storage device is a major portion of the cluster expense.