A shared file system typically refers to an enterprise storage file system that is concurrently shared (i.e., accessed for reading and writing) by multiple computer systems. One example of such a shared file system is VMware's VMFS (Virtual Machine File System), which enables multiple virtual machines that are instantiated on one or more physical servers to operate under a common file system whose data storage is implemented on a shared data storage system. An example of such a shared data storage system is a disk array accessible through a storage area network (SAN). A typical SAN provides access to a number of data storage systems that are physically independent enclosures containing a storage system manager (e.g., a disk array controller), a disk cache and multiple physical data storage units (e.g., disk drives). The storage system manager manages the physical data storage units and exposes them to the connected computer systems as logical data storage units, each identified by a logical unit number (LUN), enabling storage operations to be carried out on the LUNs using storage hardware.
Shared filed systems need to implement concurrency control mechanisms to prevent multiple contexts (i.e., processes running on the connected computer systems) from simultaneously accessing the same file system resources resulting in data corruption and unintended data loss. One such concurrency control mechanism utilizes the notion of acquiring locks corresponding to file system resources (e.g., files, file descriptors, data block bitmaps, etc.) prior to acting upon such file system resources.
The acquisition of locks, itself, involves “reserving” the data storage unit (e.g., LUN) upon which the lock (and corresponding file system resource and/or data) resides, such that only the context desiring to acquire the lock has exclusive access to the data storage unit. After acquiring the desired lock, the context releases its reservation, freeing the data storage unit to service other contexts. In an architecture where the computer systems are connected to a SAN by a SCSI interface (Small Computer System Interface), one example of such a reservation system is the conventional SCSI reservation command that can be issued by a file system to a LUN in the SAN on behalf of a context running on a connected computer system.
Reserving the data storage unit to acquire a desired lock prevents multiple contexts from simultaneously trying to acquire the same lock. Specifically, without reserving the data storage unit, two competing contexts could both read a lock simultaneously, determine that the lock is free, and then both write the lock to acquire it (i.e., write an identifier value to an ownership field in the lock). Each context would conclude that it had successfully acquired the lock and access the lock's corresponding file system resource or data, causing data loss and corruption. However, reserving the data storage unit to acquire locks in this manner can be a significant bottleneck because it prevents other contexts from accessing unrelated resources and data that coincidentally reside on the same data storage unit until the reservation is released. This bottleneck is exacerbated because typical actions performed by contexts on file system resources require the acquisition of multiple locks, thereby increasing the number and duration of reservations.