Network data storage is typically provided by an array of disk drives integrated with large semiconductor cache memory. A file server is used to interface the cached disk array to the network. The file server performs mapping of a network files to logical block addresses of storage in the cached disk array and move data between a network clients and the storage in the cached disk array. The file server use a network block services protocol in a configuration process in order to export to the network client logical volumes of the network-attached storage, which become local pseudo-disk instances. See, for example, Jiang et al., Patent Application Publication US 2004/0059822 A1 published Mar. 25, 2004, entitled “Network Block Services for Client Access of Network-Attached Storage in an IP Network,” incorporated herein by reference. Network clients typically use a network file system access protocol to access one or more file systems maintained by the file server.
Typically the logical block addresses of storage are subdivided into logical volumes. Each logical volume is mapped to the physical storage using a respective striping and redundancy scheme. The data mover computers typically use the Network File System (NFS) protocol to receive file access commands from clients using the UNIX (Trademark) operating system or the LINUX (Trademark) operating system, and the data mover computers use the Common Internet File System (CIFS) protocol to receive file access commands from clients using the MicroSoft (MS) WINDOWS (Trademark) operating system. The NFS protocol is described in “NFS: Network File System Protocol Specification,” Network Working Group, Request for Comments: 1094, Sun Microsystems, Inc., Santa Clara, Calif., March 1989, 27 pages, and in S. Shepler et al., “Network File System (NFS) Version 4 Protocol,” Network Working Group, Request for Comments: 3530, The Internet Society, Reston, Va., April 2003, 262 pages. The CIFS protocol is described in Paul J. Leach and Dilip C. Naik, “A Common Internet File System (CIFS/1.0) Protocol,” Network Working Group, Internet Engineering Task Force, The Internet Society, Reston, Va., Dec. 19, 1997, 121 pages.
The data mover computers may also be programmed to provide clients with network block services in accordance with the Internet Small Computer Systems Interface (iSCSI) protocol, also known as SCSI over IP. The iSCSI protocol is described in J. Satran et al., “Internet Small Computer Systems Interface (iSCSI),” Network Working Group, Request for Comments: 3720, The Internet Society, Reston, Va., April 2004, 240 pages. The data mover computers use a network block services protocol in a configuration process in order to export to the clients logical volumes of network attached storage, which become local pseudo-disk instances. See, for example, Jiang et al., Patent Application Publication US 2004/0059822 A1 published Mar. 25, 2004, entitled “Network Block Services for Client Access of Network-Attached Storage in an IP Network,” incorporated herein by reference.
A storage object such as a virtual disk drive or a raw logical volume can be contained in a file compatible with the UNIX (Trademark) operating system so that the storage object can be exported using the NFS or CIFS protocol and shared among the clients. In this case, the storage object can be replicated and backed up using conventional file replication and backup facilities without disruption of client access to the storage object. See, for example, Liang et al., Patent Application Publication US 2005/0044162 A1 published Feb. 24, 2005, entitled “Multi-Protocol Sharable Virtual Storage Objects,” incorporated herein by reference. The container file can be a sparse file. As data is written to a sparse file, the size of the file can grow up to a pre-specified maximum number of blocks, and the maximum block size can then be extended by moving the end-of-file (eof). See, for example, Bixby et al., Patent Application Publication US 2005/0065986 A1 published Mar. 24, 2005, entitled “Maintenance of a File Version Set Including Read-Only and Read-Write Snapshot Copies of a Production File,” incorporated herein by reference, and Mullick et al., Patent Application Publication 2005/0066095 A1 published Mar. 24, 2005, entitled “Multi-Threaded Write Interface and Methods for Increasing the Single File Read and Write Throughput of a File Server,” incorporated herein by reference.
When using the network block services protocol to access a SCSI LUN contained in a UNIX-based container file system, there is often a performance degradation in comparison to access of a network attached SCSI LUN that is not contained in a container file system. This performance degradation is caused by a mapping overhead incurred when management of the container file system does a lookup of the address of the data block associated with a specified offset in the SCSI LUN and this lookup requires the fetching of one or more indirect blocks in the disk block hierarchy of the file containing the SCSI LUN. This mapping overhead has been tolerated as a characteristic of a UNIX file system that permits each data block to be allocated from any convenient location on disk. This characteristic of a UNIX file system supports sparse files and possible sharing of specified data blocks between files for enabling “write somewhere else” snapshot copies, and de-duplication of specified data blocks. Nevertheless, it is desired to eliminate this mapping overhead for a file containing a network-attached SCSI LUN or any other file that sometimes might not need each data block to be allocated from any convenient location on disk.
The data block mapping protocol of a file is selectable between a direct mapping protocol that does not use mapping information stored in any indirect block of the file, and an indirect mapping protocol that uses mapping information stored in at least one indirect block of the file. Thus, at any given time, a file is either in a direct mapping state or an indirect mapping state. In the direct mapping state, once the inode for a file is fetched from storage, a pointer to a specified data block in the logical extent of the file is computed by execution of a computer program without accessing mapping information from any indirect block of the file. In the indirect mapping state, computation of a pointer to a specified block in the logical extent of the file may require information read from an indirect block of the file. Further, direct-mapped means all data blocks have a predetermined location, and that all data blocks are assumed to be allocated; direct-mapping does not allow for holes in the file structure until you get partial direct mapping, and then holes can only appear in indirect-mapped parts of the file.
In an indirect mapping protocol, such as the conventional indirect mapping protocol of a UNIX-based file system, the indirect mapping protocol permits any free block of the file system to be allocated to a file of the file system and mapped to any logical block of the logical extent of the file. This unrestricted mapping ability of the conventional indirect mapping protocol of a UNIX-based file system is a result of the fact that the metadata for each file includes a respective pointer to each data block of the file.
Thus, when the file is in the direct mapping state, data access performance is improved relative to data access performance of the indirect mapping state, because the direct mapping state ensures that once the inode of the file has been fetched from storage, there will be no need to fetch any indirect block table of the file from the storage in order to compute the physical location pointer to the data block for a specified logical block offset in the file.
When the file is in the indirect mapping state, flexibility of file system block allocation to the file is improved relative to the direct mapping state, because the use of at least one indirect block provides additional information for a more flexible allocation of file system blocks to the file. The additional flexibility of file system block allocation permits more efficient storage utilization by reducing storage fragmentation and permitting sparse files and efficient dynamic extension of files. In addition, the flexibility of allocating any free file system block to any logical block in the logical extent of a file permits supplementary storage services such as snapshot copy and de-duplication in which any data block in a file system can be freely shared among files of the file system.
A file in the direct mapped state does not have the flexibility of allocating any free file system block to any logical block in the logical extent of a file and thus does not permit supplementary storage services such as snapshot copy in which any data block in a file system can be freely allocated anywhere on the disk storage. To permit snapshot copy, it is necessary to either create the full map by accommodating data blocks that will be used for snapshot copy when creating a file in direct mapping mode or to convert the file in direct mapping mode to indirect mapping mode before creating snapshot copy.
Read or write access to files and their snapshot copies in a manner described above are considerably slower than access to files in direct mapping mode. As a result snap creation process also suffers from high preparation latency. Additionally writing data to file in fully direct mapping mode causes unnecessary space consumption because space required for all possible future snapshot copies of the file needs to be provisioned ahead of time when the file is initially created.
The storage technology described above, in combination with a continuing increase in disk drive storage density, file server processing power and network bandwidth at decreasing cost, has provided network clients with more than an adequate supply of network storage capacity at affordable prices. Increasing the performance by avoiding I/O involved in looking up block number, reducing the time it takes to read data from the file or write data to the file, reducing the space required to write data to file, to allow advanced operations like snapshot writes with partial direct mappings and to allow conversion from one mode to another would be advancement in the data storage computer-related arts. This is becoming increasingly important as the amount of information being handled and stored grows geometrically over short time periods and such environments add more file systems and data at a rapid pace.