In modern computer systems, large collections of data are usually organized on storage disks as files. If a large number of files exist they may be distributed over multiple disks and/or computer systems. A file system is used to control access to the files by communicating with the various disk drives. Computer programs access the files by requesting file services from one or more file systems. Some file systems also assist with error recovery functions.
In most file systems, the files contain user data and metadata. Metadata is information required to manage the user data, such as names, locations, dates, file sizes, access protection, and so forth. The organization of the user data is usually managed by the user programs.
A cluster file system is one form of distributed file system, where the file servers share access to disk storage devices, communicating with them through a shared disk server layer. The file system data and metadata are stored on the shared disk storage devices. The file servers use a lock protocol to manage accesses to shared files, whereby certain locks must be obtained from a shared pool of locks maintained by a lock server before read, write, or other accesses are allowed to files and file system metadata. To store a larger number of files, additional disks and servers must be added. To simplify the organization of files, groups of files or “volumes” are often manually assigned to particular disks. Then, the files can be manually moved or replicated when components fill up, fail, or become throughput-bound. A cluster file system can reduce management complexity, increase scalability by allowing more servers and shared storage devices to be incrementally added to the cluster, and increases availability by allowing any file server in the cluster to access all files on the shared disks, even in the presence of hardware failures.
One such cluster file system has been developed by Sistina Software, Inc. and is called the Global File System (GFS). This system has been described in a number of publications including: “The Global File System” Proceedings of the Fifth NASA Goddard Conference on Mass Storage Systems by Steven R. Soltis, Thomas M. Ruwart, et al. (1996); and “A 64-bit, Shared Disk File System for Linux” Sixteenth IEEE Mass Storage Systems Symposium held jointly with the Seventh NASA Goddard Conference on Mass Storage Systems & Technologies, Mar. 15-18, 1999, San Diego, Calif. by Kenneth W. Preslan, Andrew P. Barry et. al. all of which are hereby incorporated by reference. In addition, U.S. Pat. No. 6,493,804 issued to Soltis et al., herein incorporated by reference, describes some of the internal metadata structure and operations of GFS.
The Global File System (GFS) is a shared-storage-device, cluster file system. GFS supports simultaneous reads and writes to the file system from different file servers, journaling of read and write operations where each computer (also known as a “node”) has its own separate journal, and rapid recovery from node failures. Nodes within a GFS cluster physically share the same storage by means of Fibre Channel (FC), shared SCSI devices, iSCSI, or network block devices (that commonly work over IP networks). The file system is configured so that it appears to reside on each node and will synchronize a file access across the cluster. All GFS nodes can access the shared storage devices and the user data and file system metadata that resides on these shared devices by obtaining and releasing locks associated with these files and metadata. GFS uses read and write caching while maintaining full POSIX file system semantics.
Other cluster file systems are known to those skilled in the art. For example, U.S. Pat. No. 6,173,293, issued Jan. 9, 2001 to Thekkath et al., herein incorporated by reference, discloses a file system that is distributed over a plurality of computers connected to each other by a network. The plurality of computers execute user programs, and the user programs access files stored on a plurality of physical disks connected to the plurality of computers. The file system includes a plurality of file servers executing on the plurality of computers as a single distributed file server layer. In addition, the file system includes a plurality of disk servers executing on the plurality of computers as a single distributed disk server layer, and a plurality of lock servers executing on the plurality of computers as a single distributed lock server to coordinate the operation of the distributed file and disk server layers so that the user programs can coherently access the files on the plurality of physical disks. The plurality of file servers executes independently on a different one of the plurality of computers, and the plurality of file servers communicate only with the plurality of disk servers and the plurality of lock servers, and not with each other. Furthermore, the disk server layer organizes the plurality of physical disks as a single virtual disk having a single address space for the files.
Cluster file systems provide important benefits including scalability, simplified management, and availability. However, from the standpoint of user programs the extra overhead associated with obtaining and releasing locks when accessing files and metadata is purely overhead. This overhead is not seen in traditional non-distributed, local file systems that run on a single computer with only a single file server, where no locking is required between separate computers. In addition, the need to obtain a lock for read and write accesses to files makes it difficult to directly employ some standard performance-enhancing techniques used in non-distributed, local file systems. For example, when files and directories are accessed in a local file system it is useful to read-ahead and access more file system blocks than the current request requires, because a common access pattern is to access a sequence of files in a directory, or a sequence of data blocks associated with the same file. This technique cannot be directly used in a cluster file system because locks must first be taken out on file metadata being accessed.
Another problem in cluster file systems is that in contrast to local file systems that concentrate global state (like quota files) and global statistics (like the results from a file system “df” operation) into one data structure for fast access and update, this centralized state information creates a bottleneck for nodes in a cluster. For example, if the quota file is kept perfectly up to date each time a file is changed, it will be a significant performance and scalability bottleneck.
Extra locking overhead is inherent in cluster file system accesses. If the locking operations are not optimized properly, it will be difficult for a cluster file system to achieve the performance of a local file system.