Technical Field
The present disclosure relates to storage systems and, more specifically, to establishment of a quorum in a cluster of storage systems.
Background Information
A storage system typically includes one or more storage devices, such as solid state drives (SSDs) embodied as flash storage devices, into which information (i.e., data) may be entered, and from which data may be obtained, as desired. The storage system (i.e., node) may logically organize the data stored on the devices as storage containers, such as files and/or logical units (LUNs). To improve the performance and availability of the data contained in the storage containers, a plurality of nodes may be interconnected as a cluster configured to provide storage service relating to the organization of the storage containers and with the property that when one node fails another node (i.e., the surviving node) may service data access requests, i.e., operations, directed to the failed node's storage containers. However, more than one surviving node (which may include a node that the cluster had incorrectly determine as failed) may attempt to service the data access requests directed to the failed node's storage container, which may result in data corruption. Typically, this may be solved using a quorum; however, for small clusters, this approach is inefficient (or even fails), as losing a single node may result in an inability to establish a quorum so as to continue to serve data.
Typically, failover in a cluster depends on a quorum to guarantee that no two disjoint sets of nodes within the cluster each attempt to make progress (e.g., write to the storage devices) on their own, potentially leading to data corruption. The quorum may be implemented as a voting scheme, where each node in the cluster is granted a number of votes (e.g., one) and as long as a majority of votes allocated across the cluster is cast among non-failing nodes (surviving nodes), the surviving nodes may continue to operate as the cluster and make progress. However, for a small, e.g., two-node, cluster where each node has a single quorum vote, a failure to a node results in the surviving node not having a sufficient number of votes to constitute a majority, thus preventing proper operation of the cluster.