Distributed (or cross-host) locks can be used to synchronize operations of multiple nodes (i.e., hosts). Distributed locks may be implemented using a network lock manager (NLM) or alternatively, in systems, where a more reliable storage area network (SAN) is available, using on-disk lease-based locks.
Networks are prone to failure and when the IP network of hosts is not working or the host managing a lock crashes, a new manager for a lock must be “elected”, and there are many complicated implementation issues. These systems also have to recover from network partitioning that may give rise to the so called “split-brain” problem wherein the cluster may split into two or more sub-clusters each of which may lay exclusive claim to critical on-disk resources. As a result, NLMs have limits on maximum number of node failures. A NLM may choose to heartbeat to a “quorum” disk as an extra way to help determine if a node is down; clustering software such as Red Hat™ Cluster Suite and VERITAS™ Cluster Service have used the notion of “quorum” disk in the past, although they do not specifically implement locks.
On-disk lease-based locks rely on mutual exclusion primitives such as SCSI-2 reserve/release or SCSI-3 persistent reservation/release that are supported by most disk array controllers. The disk array controller manages the physical disk drives and exposes them to connected computer systems as logical data storage units (DSU), each identified by a logical unit number (LUN). The performance of on-disk lease-based locks depends on the efficacy of the mutual exclusion primitives provided by these storage systems.
In particular, SCSI-2 reservations may be expensive as these operate at LUN granularity and limit access to the DSU to the node holding reservation. Consequently, any other computer system attached to the DSU cannot do IO to the LUN until the duration of reservation is over. SCSI-3 based reservation meanwhile, allow IOs from nodes/initiators other than the node holding the reservation but the overhead of a SCSI reservation/release may still be prohibitive.