Distributed computer systems have the capability of sharing resources. “Clustering” generally refers to a computer system organization where multiple computing platforms, or nodes, are networked together to cooperatively perform computer tasks. Different methods are used to handle network events or network failures between nodes in the cluster. These types of network events may be, for example, a total network partition that results when two or more disjoint sets of nodes are unable to communicate with each other while the nodes with a particular set can communicate with each other, or a partial failure that only breaks specific network links between certain nodes.
These types of network failures are generally handled by ensuring that a subset of the cluster nodes can safely continue operation while the other subset of nodes is excluded from the cluster through a containment methodology or protocol. For example, one containment strategy is to only allow a subset of nodes in the majority to operate as cluster members, while the minority subset of nodes discontinue operation (a majority quorum strategy). Variations of this strategy may involve a third party arbitrator in making the decision on which nodes should be excluded from the cluster. Cluster systems that use a storage area network (SAN) utilize the SAN as a secondary control network and have additional flexibility in determining nodes to exclude from the cluster or contain by ensuring that the contained nodes cannot access the SAN. Containment of nodes over a storage area network commonly uses persistent reserve or disk fencing where the contained nodes are disallowed disk access, or alternatively by a voting type of algorithm (e.g., Paxos algorithm) over the SAN that forces the selected nodes to discontinue utilizing the shared storage. The method may also include re-routing communications in an attempt to remedy the network failure. The selection of nodes to exclude from the cluster (or contain) that ensures access to shared resources by those nodes is disallowed may be referred to as an “expel protocol” where the nodes are essentially “expelled” from the cluster.