Clustered file systems include enterprise storage file systems that are shared (i.e. accessible for reading and writing) by multiple computer systems often referred to as hosts. One example of such a clustered file system is VMware's Virtual Machine File System, (“VMFS”). The VMFS enables multiple applications (e.g., virtual machines, database instances, etc) instantiated on one or more physical servers or hosts to mount and use a common file system where data storage is implemented on a shared data storage system. An example of a shared data storage system is a disk array accessible through a storage area network (“SAN”). A typical data storage system is a physically independent enclosure containing a storage system manager (e.g., a disk array controller), a disk cache (e.g, a non-volatile RAM based cache), and multiple physical data storage units (e.g., disk drives). The storage system manager manages the physical data storage units and exposes them to the hosts as logical data storage units, each identified by a logical unit number (“LUN”), enabling storage operations to be carried out on the LUNs using storage hardware.
Clustered file systems provide a desirable multi-host input/output (“IO”) architecture because they can service multiple parallel IO streams from multiple hosts directly to the same shared file system volume on shared storage. However, many file operations on clustered file systems are costlier than they would be on local non-clustered systems. Many file operations require manipulation of file system metadata. When performed in clustered file systems, such manipulations require concurrency control mechanisms that provide some form of notification of the events to other participant hosts in the cluster to prevent multiple hosts accessing the shared storage system from simultaneously modifying the same file system resources, thereby causing data corruption and unintended data loss. These notifications incur IO-class latencies, and therefore the file operations are costlier than those on local non-clustered file systems that do not require such cross host notifications.
One such concurrency control mechanism uses the notion of acquiring locks corresponding to file system resources (e.g., directory contents, file descriptors, data block bitmaps, etc.) prior to acting upon such file system resources.
One example of a method for acquiring locks, itself, involves the host “reserving” the data storage unit (e.g., LUN) upon which a special data structure known as a lock and corresponding file system resource governed by the lock resides, such that only said host has exclusive read and write access to the data storage unit. After acquiring the desired lock via a combination of read and write operations, said host releases its reservation, thereby freeing the data storage unit to service other hosts sharing the data storage unit. In an architecture where the computer systems are connected to a SAN by a Small Computer System Interface (“SCSI”) and execute IO operations to the LUN using SCSI commands, one example of such a reservation system is the conventional SCSI reservation command that can be issued by a file system to a LUN in the SAN on behalf of a process running on a connected computer system, as described in application Ser. No. 10/773,613 ('613 application).
Reserving the data storage unit to acquire a desired lock prevents multiple hosts from simultaneously trying to acquire the same lock. Specifically, without reserving the data storage unit, two competing hosts could both read a lock simultaneously, determine that the lock is free, and then both write the lock to acquire it (e.g., write a unique host identifier value to an ownership field in the lock). Each process would conclude that it had successfully acquired the lock and access the lock's corresponding file system resource or data, causing data loss and corruption. Thus, this locking system prevents multiple processes from modifying data concurrently and causing data loss and corruption. Other cluster file system locks, such as network based locks and locks that include a combination of network and on-disk locks also prevent multiple hosts from concurrently modifying data and causing data loss and corruption. However, acquiring locks can be a significant bottleneck when it is performed for each file open and each IO to small files hosted on such a clustered file system. When a system, for example a Virtual Machine, is powering on or making other power state change operations, there are numerous small files that need to be opened and read. Many of the file open and IO requests are for data read only. It would be useful to have a system that would reduce clustered file system locking overhead for common file system operations, such as opening files, read only IO to small files, and closing files that may be performed safely without acquiring a lock.