A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes or disks. The storage system may be deployed within a storage area network (SAN) or a is network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g., the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
The file server, or filer, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the filer. The clients typically communicate with the filer by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and placing of that storage on a network. However, the SAN storage system typically manages specifically assigned storage resources. Although storage can be grouped (or pooled) into zones (e.g., through conventional logical unit number or “lun” zoning, masking and management techniques), the storage devices are still pre-assigned by a user, e.g., a system administrator, to the storage system. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media (i.e., network) adapted to operate with block access protocols, such as Small Computer Systems Interface (SCSI) protocol encapsulation over FC or TCP/IP/Ethernet.
Storage virtualization generally involves the pooling of storage resources from multiple storage devices, such as physical disks, typically across a network by one or more storage systems to create a “user-defined volume”. The term “volume” as conventionally used in a SAN environment implies a storage entity that is constructed (by a system administrator) by specifying physical disks and extents within those disks via operations that combine those extents/disks into a user-defined volume storage entity. An extent is a set of contiguously addressed blocks (or “slices”) of storage within the specified physical disks. Such construction can occur on either the storage device or application server. Storage virtualization is often used as part of a SAN deployment, wherein the user-defined volume appears as a single storage entity to the operating system, regardless of the types of storage devices pooled. Virtualization thus separates the representation of storage to the operating system from the actual physical storage connected over the network.
Storage virtualization has many interpretations, including decoupling of physical disk size limitations and underlying physical structure from a user-defined volume corresponding to a disk or lun. Virtualization may also refer to management of luns, including defining underlying reliability guarantees of the storage. Commonly, this aspect of virtualization is accomplished through explicit mirroring or Redundant Array of Independent (or Inexpensive) Disks (RAID) protection levels to a lun that is formed from the storage pool. That is, the system administrator explicitly defines the underlying reliability guarantees of the constructed user-defined volume. It can be appreciated that this administrative procedure is complex, time consuming and, therefore, costly.
Virtualization may further denote the ability to modify an existing configuration of a lun (e.g., to increase its size) along with the performance characteristics of the lun. However, conventional physical disks and strategies that explicitly construct larger units of storage for use by clients may suffer performance limitations. For example, bandwidth to a user-defined volume constructed through explicit aggregation of a number of disks and/or “slices” (extents) of those disks may be limited by physical constraints of the underlying properties of the constructed volume.
In some virtualization systems, a SAN or block-based data storage model is overlaid onto file-based file system, thereby enabling clients who require the use of block-based addressing to utilize the services of a file server having an appropriate virtualization system. In an exemplary file system, each unit of information associated with a file, including, for example, its name, its owner, time stamps, etc is implemented as a file attribute. Both files and directories have attributes, wherein each attribute may consist of a single data stream. Such an implementation facilitates the addition of new attributes to a file, including data content attributes. Therefore, files and directories may contain multiple data streams, however, each on-disk file must contain at least a default data stream through which the file data is accessed.
In the exemplary WAFL file system, individual files are described by inodes, including, for example, directory inodes, regular inodes and stream inodes. A stream inode represents a named data stream so that multiple data streams may be stored on disks associated with a storage appliance as representations embodying the stream inode type associated with a file. Each stream inode has its own size, file share locks, byte range locks and data blocks; however other file attributes, such as time stamps, group and user ownership information, and access control lists are common for all named data streams and are stored in an on-disk “base inode”. The default data stream, along with its size, data blocks, file share locks and byte range locks, is also stored in the base inode. Additionally, the names and file handles of the data streams are stored in a “hidden” directory within the file system that is referenced by the base inode. The hidden directory is represented as a stream_dir inode type. The hidden directory is “invisible” in a directory hierarchy that is viewed by a user (e.g., a client) external to the file system and, thus, is inaccessible through an external file system protocol, such as the Common Internet File System protocol.
In the example of the Write Anywhere File Layout (WAFL) file system, by Network Appliance, Inc., of Sunnyvale, Calif., a file is represented as an inode data structure adapted for storage on disks. Broadly stated, the on-disk format representation of the exemplary WAFL file system is block based using, e.g., 4 kilobyte (KB) blocks and using inodes to describe the files. An inode is a data structure used to store information, such as metadata, about the file. That is, the information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, or other attributes, described further below. The WAFL file system uses a file handle, i.e., an identifier that includes an inode number, to retrieve an inode from disk. The exemplary WAFL file system also uses files to store metadata describing the layout of its file system. These metadata files include, among others, an inode file. The on-disk format structure of the WAFL file system, including inodes and the inode file, is disclosed and described in U.S. Pat. No. 5,819,292, entitled METHOD FOR MAINTAINING CONSISTENT STATES OF A FILE SYSTEM AND FOR CREATING USER-ACCESSIBLE READ-ONLY COPIES OF A FILE SYSTEM, by David Hitz, et al., issued on Oct. 6, 1998 and incorporated by reference as though fully set forth herein.
FIG. 1 is a schematic block diagram illustrating an exemplary on-disk inode 100, which preferably includes a metadata section 110 and a data section 150. The information stored in the metadata section 110 of each inode 100 describes a file and, as such, includes the type (e.g., regular or directory) 112 of the file, the size 114 of a file, time stamps (e.g., accessed and/or modification) 116 for the file and ownership, i.e., user identifier (UID 118) and group identifier (GID 120), of the file. The metadata section 110 further includes a xinode field 130 containing a pointer 140 that references another on-disk inode structure containing, e.g., access control list (ACL) information associated with the file or directory. The contents of the data section 150 of each inode may be interpreted differently depending upon the type of file (inode) defined within the type field 112. For example, the data section 150 of a directory inode contains metadata controlled by the file system, whereas the data section of a regular inode contains user-defined data. In this latter case the data section 150 includes a representation of the data associated with the file.
Specifically, the data section 150 of a regular on-disk inode may include user data or pointers, the latter referencing 4 kilobyte (KB) data block on disk used to store the user data. Each pointer is preferably a logical volume block number which is thereby facilitate efficiency among a file system and/or disk storage layer of an operating system when accessing the data on disks. Given the restricted size (e.g., 128 bytes) of the inode, user data having a size that is less than or equal to 64 bytes is represented in its entirety within the data section of an inode. However if the user data is greater than 64 bytes but less than or equal to 64 kilobytes (KB), then the data section of the inode comprises up to 16 pointers, each of which references a 4 KB block of data on disk. Moreover, if the size of the data is greater than 64 KB but less than or equal to 64 megabytes (MB), then each pointer in the data section 150 of the inode references an indirect inode that contains 1024 pointers, each of which references a 4 kilobyte data block on disk.
Some known storage operating systems contain the capability to generate a snapshot of the file system. In the example of a WAFL-based file system, snapshots are described in TR3002 File System Design for a NFS File Server Appliance by David Hitz et al., published by Network Appliance, Inc. which is hereby incorporated by reference and in above-incorporated U.S. Pat. No. 5,819,292 entitled METHOD FOR MAINTAINING CONSISTENT STATES OF A FILE SYSTEM AND FOR CREATING USER-ACCESSIBLE READ-ONLY COPIES OF A FILE SYSTEM, by David Hitz, et al.
“Snapshot” is a trademark of Network Appliance, Inc. It is used for purposes of this patent to designate a persistent consistency point (CP) image. A persistent consistency point image (PCPI) is a point-in-time representation of the storage system, and more particularly, of the active file system, stored on a storage device (e.g., on disk) or in other persistent memory and having a name or other identifier that distinguishes it from other PCPIs taken at other points in time. A PCPI can also include other information (metadata) about the active file system at the particular point in time for which the image is taken. The terms “PCPI” and “snapshot” shall be used interchangeably through out this patent without derogation of Network Appliance's trademark rights.
A snapshot is a restorable version of a file system created at a predetermined point in time. Snapshots are generally created on some regular schedule. The snapshot is stored on-disk along with the active file system, and is called into a buffer cache of the filer memory as requested by the storage operating system. An exemplary file system inode structure 200 is shown in FIG. 2. The inode for an inode file 205 contains information describing the inode file associated with a given file system. In this exemplary file system inode structure the inode for the inode file 205 contains a pointer to an inode file indirect block 210. The inode file indirect block 210 contains a set of pointers to inodes 217, which in turn contain pointers to indirect blocks 219. The indirect blocks 219 include pointers to file data blocks 220A, 220B and 220C. Each of the file data blocks 220(A–C) is capable of storing, in the illustrative embodiment, 4 kilobytes (KB) of data.
When the storage operating system generates a snapshot of a given file system, a snapshot inode is generated as shown in FIG. 3. The snapshot inode 305 is, in essence, a duplicate copy of the inode for the inode file 205 of the file system 200. Thus, the exemplary file system structure 200 includes the inode file indirect blocks 210, inodes 217, indirect blocks 219 and file data blocks 220A–C as in FIG. 2. When a user modifies a file data block, the file system layer writes the new data block to disk and changes the active file system to point to the newly created block.
FIG. 4 shows an exemplary inode file system structure 400 after a file data block has been modified. In this illustrative example, file data block 220C was modified to file data block 220C′. When file data block 220C is modified to file data block 220C′, the contents of the modified file data block are written to a new location on disk as a function of the exemplary WAFL file system. Because of this new location, the indirect block 419 must be rewritten. Due to this changed indirect block 419, the inode 417 must be rewritten. Similarly, the inode file indirect block 410 and the inode for the inode file 405 must be rewritten. Thus, after a file data block has been modified the snapshot inode 305 contains a point to the original inode file indirect block 210 which in turn contains pointers through the inode 217 and an indirect block 219 to the original file data blocks 220A, 220B and 220C. However, the newly written indirect block 419 includes pointers to unmodified file data blocks 220A and 220B. The indirect block 419 also contains a pointer to the modified file data block 220C′ representing the new arrangement of the active file system. A new inode for the inode file 405 is established representing the new structure 400. Note that metadata (not shown) stored in any snapshotted blocks (e.g., 305, 210, and 220C) protects these blocks from being recycled or overwritten until they are released from all snapshots. Thus, while the active file system inode for the inode file 405 points to new blocks 220A, 220B and 220C′, the old blocks 210, 217, 219 and 220C are retained until the snapshot is fully released.
After a snapshot has been created and file data blocks modified, the storage operating system can reconstruct or “restore” the file system inode structure as it existed at the time of the snapshot by accessing the snapshot inode. By following the pointers contained in the snapshot inode 305 through the inode file indirect block 210, inode 217 and indirect block 219 to the unmodified file data blocks 220A–C, the storage operating system can reconstruct the file system as it existed at the time of creation of the snapshot.
In known restoration techniques from snapshots, the snapshotted files are copied from the snapshot to the active file system. These copies are generated by duplicating inodes and data blocks stored in the snapshot and writing these duplicated blocks and inodes to the active file system. Thus, the snapshot is effectively duplicated into the active file system. A noted disadvantage of such a restore technique is that each inode or data block of the snapshot needs to be copied. Such copying, in the case of a large file system, can require a substantial amount of time and processing power. For example, files may be sized on the order of tens of gigabytes. Similarly, using known file restore techniques from a snapshot, the volume containing the snapshotted file must be large enough to accommodate two full copies of the file, namely, the snapshot and the file in the active file system. In the example of the large file, a volume may not be of sufficient size to accommodate two full copies of the file.
One technique to avoid resource-consuming duplication the entire file system is to use the storage operating system's capabilities to restore on demand. Restore on demand techniques are described generally in U.S. patent application Ser. No. 10/101,901 entitled SYSTEM AND METHOD FOR MANAGING A PLURALITY OF SNAPSHOTS by Hugo Patterson et al. A noted disadvantage of such restore on demand technique is an entire directory tree associated with the file must also be restored. For example, if the desired file to be restored is two directories down, for example, in /foo/bar/file, then the directory /foo and the subdirectory /bar must also be restored. This reduces the efficiency of the file restoration process. Additionally, such restore on demand techniques typically cannot support the restoration of files that include streams or other metadata that are not stored internal to the file but are, stored in a separate data stream associated with the file. Such restore on demand techniques typically utilize the snapshot copying methodology, described above, to restore a particular file. Thus, the noted disadvantages of the snapshot duplication method, e.g., processing overhead and use of file system space, are inherent in these restore on demand techniques.
However, there are instances when the restoration of only a single file from a snapshot is desired. For example, the entire file system may not suffer an error condition, but a single file may become corrupted. Additionally, a user may have modified files but later desires to restore the files to a previous state. In these instances, the restoration of the entire file system is clearly an inefficient approach.