In the data processing art, there has been a general trend to specialize and distribute data processing tasks among an increasing number data processors. For example, more than a decade ago, it was common for a mainframe computer to be programmed with a file system manager and various application programs that invoked the file system manger in order to access files of the file systems. In turn, the mainframe computer sent logical block access commands to another mainframe processor of a cached disk array.
More recently, it has been common for application programs to be executed by workstations such as personal computers networked to file servers. Each file server is programmed with a file system manager. Each file server may include a volume manager for access to storage of disk drives in the file server. However, file servers have been networked or clustered in various ways to enable share access to storage subsystems or arrays of disk drives by multiple workstations.
Data consistency problems may arise if two file servers share access to the same file system in storage. As described in Xu et al. U.S. Pat. No. 6,324,581, one way to solve this data consistency problem is to designate one of the file servers to be an exclusive owner of access rights to each file system. The exclusive owner of the access rights to a file system, however, may delegate data access or metadata management tasks to other file servers. For example, if a first file server receives a request from a network client for access to a file system owned by a second file server, then the first file server sends a metadata request to the second file server. The second file server responds by placing a lock on the file and returning metadata of the file. The first file server uses the metadata of the file to formulate a data access command that is used to access the file data in the file system directly to the disk array over a bypass data path that bypasses the second file server.
As further described in Jiang et al. U.S. Patent Application Publication 2005/0240628 published Oct. 27, 2005, metadata management in a file server or storage network is delegated from a primary data processor to a secondary data processor in order to reduce data traffic between the primary data processor and the secondary data processor. The primary data processor retains responsibility for managing locks upon objects in the file system that it owns, and also retains responsibility for allocation of free blocks and inodes of the file system. The leasing of free blocks and inodes to the secondary and the granting of locks to the secondary enables the secondary to perform other metadata management tasks such as appending blocks to a file, truncating a file, creating a file, and deleting a file.
For data backup and recovery, it is often desirable to make a snapshot copy of a production dataset. In this context, a production dataset is a dataset that changes dynamically as one or more applications write to the dataset. A snapshot copy of the production dataset is a static, point-in-time copy of the production dataset.
A snapshot copy can be created concurrently with write access to the production dataset by preserving original data while applications write new data to the production dataset. A record is kept of data blocks that have been changed since the time of the snapshot. Typically this is done either at the logical volume level by keeping a bitmap of logical blocks that have changed in a logical volume, or at the file system level by keeping a map of file system blocks that have changed in a file system.
Basically, there are two methods for preserving the original data while applications write new data to the production dataset. In a “copy on first write” method, for the first time that new data is written to a block of the production dataset since the time of the snapshot, the original data is copied to a new block location for the snapshot copy, and the new data is written to the original block location of the production dataset. In a “write anywhere” method, at least for the first time that new data is written to a block of the production dataset since the time of the snapshot, the new data is written to a new block location, and the block mapping for the production dataset is changed so that the new block location is mapped to the production dataset, and the original block location is mapped to the snapshot copy.
The “copy on write” method has the advantage of preserving the original block mapping for the production dataset, so that physical co-locality of data is maintained in the production dataset. This advantage is obtained, however, at the expense of slower write performance because of the time required for copying the original data to a new block location.
The “write anywhere” method has the advantage of faster write performance because there is no delay needed for any copying of the original data to a new block location prior to completion of a write operation. This advantage is most significant for large I/O's of multiple blocks, and least significant for partial block writes because a partial block write involves a read-modify-write of the block. In any case, however, there may be a degradation of read-write performance over time due to loss of physical co-locality of the data of the production dataset. The block pre-allocation method described above may reduce this loss of physical co-locality by appropriate allocation and mapping of co-located physical storage to the new block locations allocated to the new data written to the production dataset.
Typically, the “copy on first write” method has been used for making snapshot copies of logical volumes or LUNs at the “backend” disk storage array of a data storage system. The snapshot copy process occurs “in band” with the write access to the disk storage array. An example of a commercial product using this method is the EMC Corporation Time Finder™ snapshot copy facility.
The “write anywhere” method has been used for making snapshot copies of file systems at the file system level in a file server. An example of a commercial product using this method is the EMC Corporation's ISCSI snapshot copy facility for its CELERA™ network file server.
For example, to create a “write anywhere” snapshot of a file, the file's metadata is made “read-only.” Then the inode of the file is cloned to create a production file inode and a snapshot file inode. Initially, the indirect block tree of the file is linked to both of these inodes. When new data is first written to a block of the production file since the time of the snapshot, the new data is written to a newly allocated block, and the block pointer to the original data block is changed in the production file inode so that it points to the newly allocated block, and one bit in this block pointer indicates that this block has been written to since the time of the snapshot. For keeping a chronological series of snapshots of the file, this one bit is more generally used as an ownership bit indicating whether or not the data of the pointed-to data block changed prior to the time of the snapshot and after the time of the next oldest snapshot. Further details regarding this procedure of creating and maintaining write-anywhere snapshots of a file are found in Bixby et al., U.S. Patent Application Pub. No. 2005/0065986 published Mar. 24, 2005 entitled “Maintenance of a File Version Set Including Read-Only and Read-Write Snapshot Copies of a Production File,” incorporated herein by reference.