As computer systems scale to enterprise levels, particularly in the context of supporting large-scale data centers, the underlying data storage systems frequently adopt the use of storage area networks (SANs). As is conventionally well appreciated, SANs provide a number of technical capabilities and operational benefits, fundamentally including virtualization of data storage devices, redundancy of physical devices with transparent fault-tolerant fail-over and fail-safe controls, geographically distributed and replicated storage, and centralized oversight and storage configuration management decoupled from client-centric computer systems management.
Architecturally, a SAN storage subsystem is characteristically implemented as a large array of Small Computer System Interface (SCSI) protocol-based storage devices. One or more physical SCSI controllers operate as externally-accessible targets for data storage commands and data transfer operations. The target controllers internally support bus connections to the data storage devices, identified as logical units (LUNs). The storage array is collectively managed internally by a storage system manager to virtualize the physical data storage devices. The virtual storage system manager is thus able to aggregate the physical devices present in the storage array into one or more logical storage containers. Virtualized segments of these containers can then be allocated by the virtual storage system as externally visible and accessible LUNs with uniquely identifiable target identifiers. A SAN storage subsystem thus presents the appearance of simply constituting a set of SCSI targets hosting respective sets of LUNs. While specific storage system manager implementation details differ between different SAN storage device manufacturers, the desired consistent result is that the externally visible SAN targets and LUNs fully implement the expected SCSI semantics necessary to respond to and complete initiated transactions against the managed container.
A SAN storage subsystem is typically accessed by a server computer system implementing a physical host bus adapter (HBA) that connects to the SAN through network connections. Within the server and above the host bus adapter, storage access abstractions are characteristically implemented through a series of software layers, beginning with a low-level SCSI driver layer and ending in an operating system specific file system layer. The driver layer, which enables basic access to the target ports and LUNs, is typically specific to the communication protocol used by the SAN storage subsystem. A data access layer may be implemented above the device driver to support multipath consolidation of the LUNs visible through the host bus adapter and other data access control and management functions. A logical volume manager (LVM), typically implemented between the driver and conventional operating system file system layers, supports volume-oriented virtualization and management of the LUNs that are accessible through the host bus adapter. Multiple LUNs can be gathered and managed together as a volume under the control of the logical volume manager for presentation to and use by the file system layer as a logical device.
In typical implementations, a SAN storage subsystem connects with upper-tiers of client and server computer systems through a communications matrix that is frequently implemented using the Internet Small Computer System Interface (iSCSI) standard. When multiple upper-tiers of client and server computer systems (referred to herein as “nodes”) access the SAN storage subsystem, two or more nodes may also access the same system resource within the SAN storage subsystem. In such a scenario, a locking mechanism is needed to synchronize the memory operations of the multiple nodes within the computer system. More specifically, a lock is a mechanism utilized by a node in order to gain access to a system resource and to handle competing requests among multiple nodes in an orderly and efficient manner.
In clustered SAN-based file systems, every node in the file system has symmetric access to the SAN. Thus, each node can access and modify a resource in the file system, such as a file or directory, at any time. In certain clustered SAN-based file systems, upon opening a resource, the node reads into its local cache metadata of the resource and relies on the cached copy of the metadata for operations until it closes the resource. When the resource is simultaneously opened by several nodes in multi-writer or similar modes, where multiple nodes can write to the files, and changes to the metadata of the resource are to be allowed, metadata cache coherence across the nodes must be ensured. In currently available clustered SAN-based file systems, metadata cache coherence is provided using complex networking setups, which adds to the implementation complexity of the file system, correctness issues or performance/scaling bottlenecks. Moreover, it requires an “asymmetric” file system implementation in that some nodes are designated “metadata servers” and shoulder the responsibility of doing all locking and ownership grants to other nodes in the system. These metadata servers may become performance and availability bottlenecks.