In a distributed file storage system, servers may be organized as one or more clusters of cooperating nodes. In one type of cluster organization, called “shared data clustering,” the nodes of a cluster (each of which may correspond to a separate physical server) share access to data storage devices. For example, the shared data storage devices may be accessible to each node of a cluster over a storage area network (SAN) implemented using a combination of Fibre Channel over Ethernet (FCoE) and other storage interconnects such as various forms of SCSI (Small Computer System Interface) including (SCSI (Internet SCSI) and other Internet Protocol-based (IP-based) storage protocols. Shared data clustering may allow each node to access large amounts of data, and may allow data to be accessed even in the event of one or more node failures—for example, surviving nodes may take up the functions originally performed by failed nodes, and may allow access to data previously being accessed from the failed nodes.
In the event of a failure of communication between the nodes of a cluster, a cluster may become partitioned—that is, instead of all the nodes in the cluster being able to communicate with each other, each node in the cluster may only be able to communicate with a particular subset (or none) of the other nodes. Sub-clusters are another scenario, where node A can communicate with node B, and node B with node C, but node A cannot communicate with node C. In the latter case, a protocol trims the cluster to a fully connected subset, such a protocol being well known.
Thus, nodes may form isolated sub-clusters, where a given node can communicate only with other nodes in its own sub-cluster, but not with a node in any other sub-cluster. Under some circumstances, more than one sub-cluster or node may assume that it is the only surviving and valid sub-cluster or node, and may attempt to access shared data as though no other node or sub-cluster remains operational. Such a scenario, in which multiple nodes or sets of nodes form independent sub-clusters, is termed a “split-brain” scenario.
In a split-brain scenario, more than one sub-cluster or node may attempt to access and update shared storage in an uncoordinated manner, thereby potentially causing data corruption. Data corruption can be avoided by shutting down all but one of the sub-clusters and guaranteeing that the remaining nodes form a healthy, communicating cluster.
Conventional split-brain solutions use dedicated coordination points (known as quorum disks) as a method of communication when network connectivity is lost. These solutions rely on one or more of the quorum disks being up and available. Because all nodes can communicate with the quorum disks even during a split-brain scenario, the various sub-clusters can coordinate with each other and reach an agreement as to which sub-cluster is to survive and which sub-clusters are to be shut down. The quorum disks may also be unavailable, or partially unavailable, or even differently available to different sub-clusters.
The need to dedicate extra and fixed resources as quorum disks can increase costs. In addition, the dedicated quorum disks are typically configured and managed by an administrator, who also has the responsibility of detecting and replacing each failed quorum disk. The quorum disks need to be continually monitored to make sure they are functioning properly and can be reached by all the nodes in the server cluster; however, the quorum disks are otherwise idle and under-utilized. Thus, conventional implementations increase the amount of administrative overhead, further increasing costs.