A cluster typically refers to a service delivery platform that includes a tightly coupled group of servers (i.e., nodes), storage devices, and software. Each node in the cluster is interconnected to at least one other node in the cluster. The nodes in the cluster are configured such that the cluster as a whole provides the ability to run failover, parallel, and/or scalable resources. Thus, clusters are useful for industries that require high availability of applications and services (e.g., telecommunications industry).
Further, each node is associated with a cluster and is configured to join the cluster it is associated with when the node is booted. However, if the cluster to which the node is configured to join is not present when the node is booted, then the node may attempt to create the specific cluster to which it is configured to join. In some situations, the cluster to which the node is to join is present but due to a communications failure between the node and the cluster, the node is not able to join the cluster and, thus, attempts to create the cluster. In this situation, the cluster may become partitioned resulting in multiple instances of the same cluster being created and executed. The operation of two instances of cluster is commonly referred to as “split-brain” and may result in data corruption, etc.
To solve the aforementioned problem, a node may only create a cluster if the node obtains a sufficient number of quorum votes to obtain a quorum. The quorum refers to the minimum number of quorum votes required to create a cluster, which is typically half the number of nodes in the cluster plus one. Further, each node in the cluster typically has one quorum vote. Thus, if a node attempting to form the cluster is connected to at least half of the other nodes in the cluster, then a quorum is reached and the cluster created.
While the aforementioned scheme is adequate for clusters containing a relatively large number of nodes, the scheme is not appropriate for two-node clusters or clusters that may easily degenerate into two-node clusters. For a two-node cluster, the quorum votes required for a two-node cluster is 2 (i.e., 2 (number of nodes in cluster)/2 +1).) Therefore, in the case of two-node clusters, if one node fails, then the remaining operational node is not able to create a cluster because the remaining operation node will never be able to obtain a quorum of 2.
The aforementioned scheme has been modified to address two-node clusters. Specifically, a quorum device is connected to the cluster such that each node in the cluster is able to communicate with the quorum device. The purpose of the quorum device is to provide an additional quorum vote. Thus, the quorum vote provided by the quorum device allows a single node in the two-node cluster to create a cluster in the event that the other node is not operational or experiencing communication difficulty. More specifically, each node in the two-node cluster includes functionality to reserve the quorum device, and thereby obtain the quorum vote associated with the quorum device. The ability to reserve the quorum device also provides a means for indicating, to the other node in the two-node cluster, that the quorum vote associated with the quorum device is in use, thereby preventing the node that does not have the quorum vote associated with the quorum device from creating a new cluster.
Quorum devices are typically shared storage devices (such as SCSI disks) and are referred to a quorum disks. The quorum disk is connected to all nodes that have a potential of joining the cluster. The use of a quorum disk typically requires that the nodes in the cluster have the appropriate hardware and software for interacting with the quorum disk.