A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored. As used herein a file is defined to be any logical storage container that contains a fixed or variable amount of data storage space, and that may be allocated storage out of a larger pool of available data storage space. As such, the term file, as used herein, and unless the context otherwise dictates can also mean a container, object or any other storage entity that does not correspond directly to a set of fixed data storage devices. A file system is, generally, a computer system for managing such files, including the allocation of fixed storage space to store files on a temporal or permanent basis.
The file server, or filer, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of its semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the filer. The clients typically communicate with the filer by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network identifying one or more files to be accessed without regard to specific locations, e.g., blocks, in which the data are stored on disk. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block access protocols, such as Small Computer Systems, Interface (SCSI) protocol encapsulation over FC or TCP/IP/Ethernet.
A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and some level of information storage sharing at the application server level. There are, however, environments wherein a SAN is dedicated to a single server. In some SAN deployments, the information is organized in the form of databases, while in others a file-based organization is employed. Where the information is organized as files, the client requesting the information maintains file mappings and manages file semantics, while its requests (and server responses) address the information in terms of block addressing on disk using, e.g., a logical unit number (lun).
Some known file systems contain the capability to generate a snapshot of the file system. In the example of a WAFL-based file system, snapshots are described in TR3002 File System Design for a NFS File Server Appliance by David Hitz, et al., published by Network Appliance, Inc. and in U.S. Pat. No. 5,819,292 entitled METHOD FOR MAINTAINING CONSISTENT STATES OF A FILE SYSTEM AND FOR CREATING USER-ACCESSIBLE READ-ONLY COPIES OF A FILE SYSTEM, by David Hitz, et al., which are hereby incorporated by reference.
“Snapshot” is a trademark of Network Appliance, Inc. It is used for purposes of this patent to designate a persistent consistency point (CP) image. A persistent consistency point image (PCPI) is a point-in-time representation of the storage system, and more particularly, of the active file system, stored on a storage device (e.g., on disk) or in other persistent memory and having a name or other identifier that distinguishes it from other PCPIs taken at other points in time. A PCPI can also include other information (metadata) about the active file system at the particular point in time for which the image is taken. The terms “PCPI” and “snapshot” shall be used interchangeably throughout this patent without derogation of Network Appliance's trademark rights.
In the example of the Write Anywhere File Layout (WAFL™) file system, by Network Appliance, Inc., of Sunnyvale, Calif., a file is represented as an inode data structure adapted for storage on disks. FIG. 1 is a schematic block diagram illustrating an exemplary on-disk inode 100, which preferably includes a meta data section 110 and a data section 150. The information stored in the meta data section 110 of each inode 100 describes a file and, as such, includes the type (e.g., regular or directory) 112 of the file, the size 114 of a file, time stamps (e.g., accessed and/or modification) 116 for the file and ownership, i.e., user identifier (UID 118) and group identifier (GID 120), of the file. The meta data section 110 further includes a xinode field 130 containing a pointer 140 that references another on-disk inode structure containing, e.g., access control list (ACL) information associated with the file or directory. The contents of the data section 150 of each inode may be interpreted differently depending upon the type of file (inode) defined within the type field 112. For example, the data section 150 of a directory inode contains meta data controlled by the file system, whereas the data section of a regular inode contains user-defined data. In this latter case the data section 150 includes a representation of the data associated with the file.
Specifically, the data section 150 of a regular on-disk inode may include user data or pointers, the latter referencing 4 kilobyte (KB) data block on disk used to store the user data. Each pointer is preferably a logical volume block number which thereby facilitates efficiency among a file system and/or disk storage layer of an operating system when accessing the data on disks. Given the restricted size (e.g., 128 bytes) of the inode, user data having a size that is less than or equal to 64 bytes is represented in its entirety within the data section of an inode. However if the user data is greater than 64 bytes but less than or equal to 64 kilobytes (KB), then the data section of the inode comprises up to 16 pointers, each of which references a 4 KB block of data on disk. Moreover, if the size of the data is greater than 64 KB but less than or equal to 64 megabytes (MB), then each pointer in the data section 150 of the inode references an indirect inode that contains 1024 pointers, each of which references a 4 kilobyte data block on disk.
A PCPI is a restorable version of a file system created at a predetermined point in time and stored on the same storage devices that store the file system. PCPIs are generally created on some regular user-defined schedule. The PCPI is stored on-disk along with the active file system, and is called into a buffer cache of the filer memory as requested by the storage operating system. An exemplary file system inode structure 200 is shown in FIG. 2. The inode for an inode file 205 contains information describing the inode file associated with a given file system. In this exemplary file system inode structure the inode for the inode file 205 contains a pointer to an inode file indirect block 210. The inode file indirect block 210 contains a set of pointers to inode blocks 215, each typically containing multiple inodes 217, which in turn contain pointers to indirect blocks 219. The indirect blocks 219 include pointers to file data blocks 220A, 220B and 220C. Each of the file data blocks 220(A-C) is capable of storing, in the illustrative embodiment, 4 kilobytes (KB) of data.
When the file system generates a PCPI of a given file system, a PCPI inode is generated as shown in FIG. 3. The PCPI inode 305 is, in essence, a duplicate copy of the inode for the inode file 205 of the file system 200. Thus, the exemplary file system structure 200 includes the inode file indirect blocks 210, inodes 217, indirect blocks 219 and file data blocks 220A-C as in FIG. 2. When a user modifies a file data block, the file system layer writes the new data block to disk and changes the active file system to point to the newly created block.
FIG. 4 shows an exemplary inode file system structure 400 after a file data block has been modified. In this illustrative example, file data block 220C was modified to file data block 220C′. When file data block 220C is modified to file data block 220C′, the contents of the modified file data block are written to a new location on disk as a function of the exemplary WAFL file system. Because of this new location, the indirect block 419 must be rewritten. Due to this changed indirect block 419, the inode 417 must be rewritten. Similarly, the inode file indirect block 410 and the inode for the inode file 405 must be rewritten. Thus, after a file data block has been modified the PCPI inode 305 contains a pointer to the original inode file indirect block 210 which in turn contains pointers through the inode 217 and an indirect block 219 to the original file data blocks 220A, 220B and 220C. However, the newly written indirect block 419 includes pointers to unmodified file data blocks 220A and 220B. The indirect block 419 also contains a pointer to the modified file data block 220C′ representing the new arrangement of the active file system. A new inode for the inode file 405 is established representing the new structure 400. Note that metadata (not shown) stored in any snapshotted blocks (e.g., 305, 210, and 220C) protects these blocks from being recycled or overwritten until they are released from all PCPIs. Thus, while the active file system inode for the inode file 405 points to new blocks 220A, 220B and 220C′, the old blocks 210, 217, 219 and 220C are retained until the PCPI is fully released.
After a PCPI has been created and file data blocks modified, the file system layer can reconstruct or “restore” the file system inode structure as it existed at the time of the PCPI by accessing the PCPI inode. By following the pointers contained in the PCPI inode 305 through the inode file indirect block 210, inode 217 and indirect block 219 to the unmodified file data blocks 220A-C, the file system layer can reconstruct the file system as it existed at the time of creation of the PCPI.
Storage systems, including multi-protocol storage appliances, export virtual disks (vdisks) to clients utilizing block-based protocols, such as, for example, Fibre Channel and iSCSI. One example of a vdisk is a special file type in a volume that derives from a plain file, but that has associated export controls and operation restrictions that support emulation of a disk. Vdisks are described further in U.S. patent application Ser. No. 10/216,453, entitled STORAGE VIRTUALIZATION BY LAYERING VIRTUAL DISK OBJECTS ON A FILE SYSTEM, by Vijayan Rajan, et al., the contents of which are hereby incorporated by reference. These block-based protocols and the exported file/vdisks appear as physical disk devices to the clients of the storage system. A well-known feature of disk devices is that they do not return no space errors, hereinafter referred to as an OUTOFSPACE error, when a write operation is directed to a space that is known to exist. It should be noted, as one skilled in the art would recognize, that the exact error returned is protocol specific. As such, the term OUTOFSPACE error should be taken to mean generally a protocol specific out of space error. In other words, disk devices will not return an OUTOFSPACE error when a previously written block on disk is being rewritten. This is because the completion of a successful write of the block establishes to the application that the data storage for the block exists. The application then assumes and depends on the continued existence of the storage going forward, and does not expect to receive an error when subsequently writing this storage. If a disk device does return an OUTOFSPACE error, clients, which typically are not expecting or programmed to respond to such errors, will typically fail or suffer from an error condition. This client failure may lead to the loss of data integrity and/or data loss. This noted problem may be generalized to other types of files. For example, a database management system assumes that once it has written successfully to an area of a file it may continue to re-write to that area of the file without receiving an OUTOFSPACE error.
However, when using a file system supporting PCPIs, it is possible to exhaust the available disk space due to re-writing data that is stored both in the active file system and in a PCPI. It should be noted that other file system architectures, including those with differing techniques for generating PCPIs, may suffer from overcommitting space by permitting blocks of data and/or metadata to be shared among PCPIs and the active file system. As such, the teachings of the present invention may be utilized in any file system supporting PCPIs. The PCPI mechanism and file system described should be taken as exemplary only. For example, a file of size X bytes exists in the file system supporting PCPIs. Immediately after a PCPI is taken of the file, the total space consumed by the file is X plus the added space required by the PCPI root inode. As blocks are modified in the PCPI file, the size consumed by the file and its associated PCPI may approach 2× bytes. That is, as the version of the file in the active file system diverges from the version stored in the PCPI, the amount of space occupied by the file approaches 2×. If the available free space on a disk is less than 2×, it is possible that a client attempting to re-write a portion of a file may receive an OUTOFSPACE error.
Additionally, certain file systems, including the above-described WAFL file system include the ability to generate sparse files. By “sparse file” it is meant a file that is created with a set size but not all of the physical blocks associated with the file are written and/or allocated at the time of file creation. For example, using certain backup operations, a sparse file may be generated and slowly written to in background (e.g., using a conventional “lazy write” operation), thereby reducing the need for a massive data transfer from one storage device to another. In such file systems, when the file is created it consists basically of holes that need to be filled. For example, in a WAFL-based file system, the root inode and associate intermediate inodes may be created, however, the file data blocks may not be allocated within the file system. As data is written to the sparse file, file data blocks are then allocated as needed.
In such file systems that utilize sparse files, it is possible that the amount of free space in the file system may become less than that utilized or required by the filled in sparse files. In such cases, a write operation directed to a sparse file may fail with an OUTOFSPACE error. As clients are typically not programmed to deal with these errors, data loss and/or a loss of data integrity may occur.
Thus, it is desirous to have a system and method for maintaining space reservations in a file system to ensure that OUTOFSPACE errors do not occur.