Network data storage is most economically provided by an array of low-cost disk drives integrated with a large semiconductor cache memory. A number of data mover computers are used to interface the cached disk array to the network. The data mover computers perform file locking management and mapping of the network files to logical block addresses of storage in the cached disk array, and move data between network clients and the storage in the cached disk array.
Data consistency problems may arise if multiple clients or processes have concurrent access to read-write files. Typically write synchronization and file locking have been used to ensure data consistency. For example, the data write path for a file has been serialized by holding an exclusive lock on the file for the entire duration of creating a list of data buffers to be written to disk, allocating the actual on-disk storage, and writing to storage synchronously. Unfortunately, these methods involve considerable access delays due to contention for locks not only on the files but also on the file directories and a log used when committing data to storage. In order to reduce these delays, a file server may permit asynchronous writes in accordance with version 3 of the Network File System (NFS) protocol. Also, in a multi-processor server, a respective one of the processors is pre-assigned to service requests for metadata of each file or file system. See, for example, Vahalia et al. U.S. Pat. No. 5,893,140 issued Apr. 6, 1999, entitled “File Server Having a File System Cache and Protocol for Truly Safe Asynchronous Writes,” incorporated herein by reference, and Xu et al., U.S. Pat. No. 6,324,581 issued Nov. 27, 2001, incorporated herein by reference.
More recently, byte range locking to a file has been proposed in version 4 of the NFS protocol. (See NFS Version 3 Protocol Specification, RFC 1813, Sun Microsystems, Inc., June 1995, and NFS Version 4 Protocol Specification, RFC 3530, Sun Microsystems, Inc., April 2003.) Asynchronous writes and range locking alone will not eliminate access delays due to contention during allocation and commitment of file metadata. A Unix-based file in particular contains considerable metadata in the inode for the file and in indirect blocks of the file. The inode, for example, contains the date of creation, date of access, file name, and location of the data blocks used by the file in bitmap format. The NFS protocol specifies how this metadata must be managed. In order to comply with the NFS protocol, each time a write operation occurs, access to the file is not allowed until the metadata is updated on disk, both for read and write operations. In a network environment, multiple clients may issue simultaneous writes to the same large file such as a database, resulting in considerable access delay during allocation and commitment of file metadata.
A method of permitting concurrent writes from multiple clients to the same file is disclosed in Mullick et al., published patent application No. US 2005/0066095 A1, published Mar. 24, 2005, entitled “Multi-threaded Write Interface and Methods for Increasing the Single File Read and Write Throughput of a File Server,” incorporated herein by reference. Each read-write operation includes three successive steps. The first step includes inode access for reads and writes, and also pre-allocation for writes. The second step includes an asynchronous read or write. The third step includes inode access for a metadata commit. Since the asynchronous write does not involve any metadata interaction, these three steps can be pipelined. The pre-allocation in the first step is achieved asynchronously, and an allocation mutex prevents multiple pre-allocations from occurring simultaneously for the same file. Once the metadata pre-allocation step is complete, the asynchronous write of the data to disk in the second step can be handled independently of the metadata pre-allocation. With pipelining, multiple asynchronous writes can be performed concurrently. In the third step, the final commit of the allocations is also achieved synchronously. The allocation mutex prevents pre-allocation for the same file from occurring at the same time as a commit for the same file. However, multiple commits for the same file may occur simultaneously by gathering the commit requests together and committing them under the same allocation mutex. Thus, execution of a write thread for writing to a file includes obtaining an allocation mutex for the file, and then preallocating new metadata blocks that need to be allocated for writing to the file, and then releasing the allocation mutex for the file, and then issuing asynchronous write requests for writing to the file, waiting for callbacks indicating completion of the asynchronous write requests, obtaining the allocation mutex for the file, and then committing the preallocated metadata blocks, and then releasing the allocation mutex for the file.