A high-availability cluster typically refers to a service delivery platform that includes a tightly coupled group of servers (i.e., nodes), storage devices, and software. Each node in the cluster is interconnected to all other nodes in the cluster. The nodes in the cluster are configured such that the cluster as a whole provides the ability to run failover, parallel, and/or scalable resources. Thus, high-availability clusters are useful for industries that require high availability of applications and services (e.g., telecommunications industry, banking industry, internal information technology, etc.).
Further, each node is associated with a cluster and is configured to join the cluster it is associated with when the node is booted. However, if the cluster to which the node is configured to join is not present when the node is booted, then the node may attempt to create an instance of the cluster. In some situations, the cluster to which the node is to join is present but due to a communications failure between the node and the cluster, the node is not able to join the cluster and, thus, attempts to create an instance of the cluster. In this situation, the cluster may become partitioned resulting in multiple instances of the same cluster being created and executed. In other situations, a joining node is unable to join a existing cluster due to a network partition between the joining node and the existing cluster. The operation of two instances of a cluster is commonly referred to as “split-brain” and may result in data corruption or data loss. Further, if the operation of the two instances occur at staggered intervals, one of the instances of the cluster may be created and proceed to operate with out-dated configuration information, a condition commonly referred to as “amnesia.”
To address the aforementioned issues, a node may only create a cluster if the node obtains a sufficient number of quorum votes to obtain a quorum. The quorum refers to the minimum number of quorum votes required to create an instance of the cluster, which is typically half the number of nodes in the cluster plus one. Further, each node in the cluster typically has one quorum vote. Thus, if a node attempting to form the cluster is connected to at least half of the other nodes in the cluster, then a quorum is reached and the instance of the cluster created.
While the aforementioned scheme is adequate for clusters containing a relatively large number of nodes, the scheme is not appropriate for two-node clusters. For a two-node cluster, the quorum votes required for a two-node cluster is 2 (i.e., 2 (number of nodes in cluster)/2+1).) Therefore, in the case of two-node clusters, if one node fails, then the remaining operational node is not able to create a cluster because the remaining operation node will never be able to obtain a quorum of 2.
The aforementioned scheme has been modified to address two-node clusters. Specifically, a quorum device is connected to the cluster such that each node in the cluster is able to communicate with the quorum device. The purpose of the quorum device is to provide an additional quorum vote. Thus, the quorum vote provided by the quorum device allows a single node in the two-node cluster to create a cluster in the event that the other node is not operational or experiencing communication difficulty. More specifically, each node in the two-node cluster includes functionality to reserve the quorum device, and thereby obtain the quorum vote associated with the quorum device. The ability to reserve the quorum device also provides a means for indicating, to the other node in the two-node cluster, that the quorum vote associated with the quorum device is in use, thereby preventing the node that does not have the quorum vote associated with the quorum device from creating an instance of the cluster.
Quorum devices are typically shared storage devices (such as SCSI disks) and are referred to as quorum disks. The quorum disk is connected to all nodes that have a potential of joining the cluster. The use of a quorum disk typically requires that the nodes in the cluster have the appropriate hardware and software for interacting with the quorum disk. Quorum devices may also be networked storage devices such as iSCSI-attached storage, a software application running on a networked system, etc.