A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server or “filer” including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. The data blocks are typically organized within a volume block number (vbn) space maintained by the file system. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored. As used herein a file is defined to be any logical storage container that contains a fixed or variable amount of data storage space, and that may be allocated storage out of a larger pool of available data storage space. As such, the term file, as used herein, and unless the context otherwise dictates can also mean a container, object or any other storage entity that does not correspond directly to a set of fixed data storage devices. A file system is, generally, a computer system for managing such files, including the allocation of fixed storage space to store files on a temporal or permanent basis.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of its semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the filer. The clients typically communicate with the filer by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network identifying one or more files to be accessed without regard to specific locations, e.g., blocks, in which the data are stored on disk. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block access protocols, such as Small Computer Systems Interface (SCSI) protocol encapsulation over FC or TCP/IP/Ethernet.
A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and some level of information storage sharing at the application server level. There are, however, environments wherein a SAN is dedicated to a single server. In some SAN deployments, the information is organized in the form of databases, while in others a file-based organization is employed. Where the information is organized as files, the client requesting the information maintains file mappings and manages file semantics, while its requests (and server responses) address the information in terms of block addressing on disk using, e.g., a logical unit number (lun).
Some known file systems contain the capability to generate a snapshot of the file system. In the example of a WAFL-based file system, snapshots are described in TR3002 File System Design for a NFS File Server Appliance by David Hitz, et al., published by Network Appliance, Inc. and in U.S. Pat. No. 5,819,292 entitled METHOD FOR MAINTAINING CONSISTENT STATES OF A FILE SYSTEM AND FOR CREATING USER-ACCESSIBLE READ-ONLY COPIES OF A FILE SYSTEM, by David Hitz, et al., which are hereby incorporated by reference.
“Snapshot” is a trademark of Network Appliance, Inc. It is used for purposes of this patent to designate a persistent consistency point (CP) image. A persistent consistency point image (PCPI) is a point-in-time representation of the storage system, and more particularly, of the active file system, stored on a storage device (e.g., on disk) or in other persistent memory and having a name or other identifier that distinguishes it from other PCPIs taken at other points in time. A PCPI can also include other information (metadata) about the active file system at the particular point in time for which the image is taken. The terms “PCPI” and “snapshot” shall be used interchangeably throughout this patent without derogation of Network Appliance's trademark rights.
In the example of the Write Anywhere File Layout (WAFL™) file system, by Network Appliance, Inc., of Sunnyvale, Calif., a file is represented as an inode data structure adapted for storage on disks. FIG. 1 is a schematic block diagram illustrating an exemplary on-disk inode 100, which preferably includes a meta data section 110 and a data section 150. The information stored in the meta data section 110 of each inode 100 describes a file and, as such, includes the type (e.g., regular or directory) 112 of the file, the size 114 of a file, time stamps (e.g., accessed and/or modification) 116 for the file and ownership, i.e., user identifier (UID 118) and group identifier (GID 120), of the file. The meta data section 110 further includes a xinode field 130 containing a pointer 140 that references another on-disk inode structure containing, e.g., access control list (ACL) information associated with the file or directory. The inode 100 may also include a set of flags 135 for tracking various metadata associated with the file. A level field 145 identifies how many levels of blocks are in the buffer tree associated with the file. Level 0 data blocks comprise the actual data blocks while level 1 blocks contain pointers to level 0 data blocks. Similarly, level 2 blocks contain pointers to level 1 blocks. The contents of the data section 150 of each inode may be interpreted differently depending upon the type of file (inode) defined within the type field 112. For example, the data section 150 of a directory inode contains meta data controlled by the file system, whereas the data section of a regular inode contains user-defined data. In this latter case the data section 150 includes a representation of the data associated with the file.
Specifically, the data section 150 of a regular on-disk inode may include user data or pointers, the latter referencing 4 kilobyte (KB) data block on disk used to store the user data. Each pointer is preferably a logical volume block number which thereby facilitates efficiency among a file system and/or disk storage layer of an operating system when accessing the data on disks. Given the restricted size (e.g., 128 bytes) of the inode, user data having a size that is less than or equal to 64 bytes is represented in its entirety within the data section of an inode. However if the user data is greater than 64 bytes but less than or equal to 64 kilobytes (KB), then the data section of the inode comprises up to 16 pointers, each of which references a 4 KB block of data on disk. Moreover, if the size of the data is greater than 64 KB but less than or equal to 64 megabytes (MB), then each pointer in the data section 150 of the inode references an indirect block that contains a plurality of pointers, each of which references a 4 KB data block on disk. An indirect block may include 510 or 1024 pointers in exemplary file systems. As the size of a file (or other data containers) represented by inode 100 increases, additional levels of blocks may be required to store the data.
A PCPI is a restorable version of a file system created at a predetermined point in time and stored on the same storage devices that hold the file system. PCPIs are generally created on some regular user-defined schedule. The PCPI is stored on-disk along with the active file system, and is retrieved into a buffer cache of the filer memory as requested by the storage operating system. An exemplary buffer tree data structure 200 is shown in FIG. 2. The inode for an inode file 205 contains information describing the inode file associated with a given file system. In this exemplary buffer tree the inode for the inode file 205 contains a pointer to an inode file indirect block 210. The inode file indirect block 210 contains a set of pointers to inode blocks 215, each typically contain ing multiple inodes 217, which in turn contain pointers to indirect blocks 219. The indirect blocks 219 include pointers to file data blocks 220A, 220B and 220C. Each of the file data blocks 220(A-C) is capable of storing, in the illustrative embodiment, 4 KB of data.
When the file system generates a PCPI of a given file system, a PCPI inode is generated as shown in FIG. 3. The PCPI (snapshot) inode 305 is, in essence, a duplicate copy of the inode for the inode file 205 of the file system 200. Thus, the exemplary file system structure 200 includes the inode file indirect blocks 210, inodes 217, indirect blocks 219 and file data blocks 220A-C as in FIG. 2. When a user modifies a file data block, the file system layer writes the new data block to disk and changes the active file system to point to the newly created block.
FIG. 4 shows an exemplary buffer tree data structure 400 after a file data block is modified. In this illustrative example, file data block 220C is modified to file data block 220C′. In response, the contents of the modified file data block are written to a new location on disk as a function of the exemplary WAFL file system. Because of this new location, the indirect block 419 is rewritten. Due to this changed indirect block 419, the inode 417 is rewritten. Similarly, the inode file indirect block 410 and the inode for the inode file 405 are rewritten. Thus, after a file data block has been modified the PCPI inode 305 contains a pointer to the original inode file indirect block 210 which, in turn, contains pointers through the inode 217 and an indirect block 219 to the original file data blocks 220A, 220B and 220C. However, the newly written indirect block 419 includes pointers to unmodified file data blocks 220A and 220B. The indirect block 419 also contains a pointer to the modified file data block 220C′ representing the new arrangement of the active file system. A new inode for the inode file 405 is established representing the new structure 400. Note that metadata (not shown) stored in any snapshotted blocks (e.g., 305, 210, and 220C) protects these blocks from being recycled or overwritten until they are released from all PCPIs. Thus, while the active file system inode for the inode file 405 points to new blocks 220A, 220B and 220C′, the old blocks 210, 217, 219 and 220C are retained until the PCPI is fully released.
After a PCPI has been created and file data blocks modified, the file system can reconstruct or “restore” the file system inode structure as it existed at the time of the PCPI by accessing the PCPI inode. By following the pointers contained in the PCPI inode 305 through the inode file indirect block 210, inode 217 and indirect block 219 to the unmodified file data blocks 220A-C, the file system can reconstruct the file system as it existed at the time of creation of the PCPI.
In a typical storage system configuration, an administrator schedules PCPIs to be generated at routine intervals, for example, once a day. By utilizing the restoration capabilities of the PCPI, the file system may be restored to a point in time represented by any saved PCPI. However, an administrator may desire to know the rate of change of data in the time intervals between PCPIs. In this context, rate of change may be illustratively defined as the number of level zero data blocks modified per unit time. Such rate of change information may be desirous when determining the frequency of PCPIs or the amount of storage space associated with a particular file system. For example, if a large percentage of the data contained within a PCPI is overwritten in the interval between generation of PCPIs, the storage space required to maintain a given number of PCPIs and the active file system is substantially more than if only a small percentage of the space is overwritten due to the space conservative nature of PCPIs.
However, there exists no efficient mechanism for quickly determining the rate of change of data between two data containers, e.g. two PCPIs or the active file system and a PCPI. Conventional “brute force” comparisons, which require block-by-block comparison of all level 0 data blocks, are computationally intensive and require that each data block of both data containers be retrieved from disk. This generates a substantial load on the disk subsystem, requiring repeated data retrieval operations. Data may be retrieved in multi-block chunks to reduce the number of individual disk operations; however, the computational cost of comparing all of the data in the two data containers remains high. For example, to compare the changes between two data containers, such as the active file system and a PCPI, the entire active file system must be retrieved from disk. If the active file system is hundreds (or thousands) of gigabytes in size, the time required to retrieve the data is prohibitively high.