A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is preferably implemented as one or more storage “volumes” of physical disks, defining an overall logical arrangement of disk space. The disks within a volume are typically organized as one or more groups, wherein each group may be operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID). Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information (parity) with respect to the striped data. The physical disks of each RAID group may include disks configured to store striped data (i.e., data disks) and disks configured to store parity for the data (i.e., parity disks). The parity may thereafter be retrieved to enable recovery of data lost when a disk fails. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988.
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on the disks as a hierarchical structure of directories, files and blocks. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system. The file system typically consists of a contiguous range of vbns from zero to n, for a file system of size n+1 blocks.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An is example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc., Sunnyvale, Calif.
The storage operating system may further implement a storage module, such as a RAID system, that manages the storage and retrieval of the information to and from the disks in accordance with input/output (I/O) operations. The RAID system is also responsible for parity operations in the storage system. Note that the file system only “sees” the data disks within its vbn space; the parity disks are “hidden” from the file system and, thus, are only visible to the RAID system. The RAID system typically organizes the RAID groups into one large “physical” disk (i.e., a physical volume), such that the disk blocks are concatenated across all disks of all RAID groups. The logical volume maintained by the file system is then “disposed over” (spread over) the physical volume maintained by the RAID system.
The storage system may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access the directories, files and blocks stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. Each client may request the services of the file system by issuing file system protocol messages (in the form of packets) to the storage system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS) and the Network File System (NFS) protocols, the utility of the storage system is enhanced.
In order to improve reliability and facilitate disaster recovery in the event of a failure of a storage system, its associated disks or some portion of the storage infrastructure, it is common to “minor” or replicate a data set comprising of some or all of the underlying data and/or the file system that organizes the data. A data set comprises an area of defined storage which may have a mirroring relationship associated therewith. Examples of data sets include, e.g., a file system, a volume or a persistent consistency point image (PCPI), described further below.
In one example, a minor is established and stored at a destination, making it more likely that recovery is possible in the event of a true disaster that may physically damage the source storage location or its infrastructure (e.g. a flood, power outage, act of war, etc.). The mirror is updated at regular intervals, typically set by an administrator, in an effort to maintain the most recent changes to the file system on the destination. The storage systems attempt to ensure that the minor is consistent, that is that the mirror contains identical data to that of the source.
One common form of update involves the use of a “snapshot” process in which the active file system at the source storage site, consisting of inodes and blocks, is captured and the changes between two snapshots are transmitted, over a network (such as the well-known Internet) to the remote destination storage site. Such mirroring techniques are described in the above-incorporated U.S. Patent Applications. By “active file system” it is meant the file system to which current input/output operations are being directed.
Note that the term “snapshot” is a trademark of Network Appliance, Inc. It is used for purposes of this patent to designate a persistent consistency point image (PCPI). A persistent consistency point image is a point in time representation of the storage system, and more particularly, of the active file system, stored on a storage device or in other persistent memory and having a name or other unique identifier that distinguishes it from other PCPIs taken at other points in time. A PCPI can also include other information (metadata) about the active file system at the particular point in time for which the image is taken. The terms PCPI and snapshot may be used interchangeably through out this patent without derogation of Network Appliance's is trademark rights. The PCPI process is described in further detail in U.S. patent application Ser. No. 09/932,578, now issued as U.S. Pat. No. 7,454,445 on Nov. 18, 2008, entitled INSTANT SNAPSHOT by Blake Lewis et al., TR3002 File System Design for an NFS File Server Appliance by David Hitz et al., published by Network Appliance, Inc., and in U.S. Pat. No. 5,819,292 entitled METHOD FOR MAINTAINING CONSISTENT STATES OF A FILE SYSTEM AND FOR CREATING USERACCESSIBLE READ-ONLY COPIES OF A FILE SYSTEM by David Hitz et al., which are hereby incorporated by reference.
An exemplary PCPI-based mirroring technique typically provides for remote asynchronous replication or mirroring of changes made to a source file system PCPI in a destination replica file system. The mirroring technique typically scans (via a scanner) the blocks that make up two versions of a PCPI of the source file system, to identify latent divergence, i.e., changed blocks in the respective PCPI files based upon differences in vbns further identified in a scan of a logical file block index of each PCPI. Trees (e.g., buffer trees) of blocks associated with the files are traversed, bypassing unchanged pointers between versions, to identify the changes in the hierarchy of the trees. These changes are transmitted to the destination replica or “mirror.” This technique allows regular files, directories, inodes and any other hierarchical structure of trees to be efficiently scanned to determine differences (latent divergence) between versions thereof. A set number of PCPIs may be retained both on the source and the destination depending upon various time-based and other criteria.
Conventional mirroring and archival backup systems typically include processes to ensure that the data set is correctly mirrored, to thereby reduce the divergence of the minor from the original source. However, errors may occur in the archival backup or minor due to, e.g., network errors, software errors and/or physical media errors of the storage devices. As a result of such errors, the mirror/backup is not identical to the source, which may cause data loss should an error condition occur on the source system. Additionally, the file systems on either the source or destination storage systems may experience an error condition. Such a file system error may be corrected by conventional file system error correction techniques; however, such correction may exacerbate mirror divergence. To ensure that a correct mirror is on the destination, a new mirroring relationship may need to be established and an initial baseline backup operation may need to be performed of the data set. This is computationally, I/O resource and network intensive to perform and also does not guarantee that the administrator has a point in time mirror of a previous point in time. That is, the new minor may be up to date, but does not reflect the contents of the mirrored source at a previous point in time, thereby reducing the effectiveness of the mirror.