Service availability within clusters involves the allocation of resources and services, including failover planning for alternate machines. The cluster infrastructure is monitored, and when a failure is identified, some part of the cluster is fenced, or isolated from the rest to prevent manifestations of the failure from propagating throughout the rest of the cluster. Thus, fencing forms part of the corrective action used to reduce the harm that might be caused within a cluster of nodes due to hardware and software failures.
Node fencing can be achieved using various techniques known to those of ordinary skill in the art, such as Small Computer System Interface-3 (SCSI-3) Persistent Group Reservation (PGR), SCSI-Reserve/Reset/Release, exclusive volume access, marking a pre-defined location on the disk for node status and action, etc. Each has some mechanism to determine which nodes are being fenced, and to initiate system abort activity.
In the case of split brain or network partition cluster operation, fencing operations typically determine survival based on retaining the largest number of operational nodes. In the instance of a pure split brain, the fenced side may simply be the side that loses the race for survival. Unfortunately, the fenced side may also be the side providing the most important services within the cluster. This type of fencing can add to service downtime, which in turn may result in lost business opportunities.