Field of the Invention
This invention relates to storage of data, and more particularly to mirroring or replication of stored data.
Background Information
A storage system is a processing system adapted to store and retrieve data on behalf of one or more client processing systems (“clients”) in response to external input/output (I/O) requests received from clients. A storage system can provide clients with a file-level access to data stored in a set of mass storage devices, such as magnetic or optical storage devices or tapes. A storage system can also provide clients with a block-level access to stored data, or with both a file-level access and block-level access.
Data storage space has one or more storage “volumes” comprising physical storage devices, defining an overall logical arrangement of storage space. The devices within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Devices (RAID).
To improve reliability and facilitate disaster recovery in the event of a failure of a storage system, its associated devices or some portion of the storage infrastructure, it is common to replicate some or all of the underlying data and/or the file system that organizes the data. In one example, a replicated copy is created and stored at a remote site, making it more likely that recovery is possible in the event of a disaster that may physically damage the main storage location or its infrastructure (e.g., a flood, power outage, etc). As used herein, the terms “source site” and “source storage system” are used interchangeably. Similarly, the terms “remote site”, “destination site”, and “destination storage system” are used interchangeably. Various mechanisms are used to create a copy of a dataset (such as a file system, a volume, a directory, or a file) at a remote site. One prior art solution stores a file system and data of the source storage system onto a media and restores the stored data at the destination site (this approach is known as “seeding”). As new data are written to a source, data stored at the destination storage system lags behind in time. Thus, the destination storage system needs to be periodically updated to reflect incremental updates to the source system (e.g., updates occurring over periodic time intervals). In order to maintain a current copy of the data at the destination site, several approaches have been advanced. According to one approach, known as a full-backup method, a file system and associated data at the source storage system are entirely recopied to a remote (destination) site over a network after a certain time interval. This approach, however, may be inconvenient in situations where the size of the file system is measured in tens or hundreds of gigabytes (even terabytes). This full-backup approach may severely tax the bandwidth of the network as well as processing capabilities of both destination and source storage system.
According to another approach, a file system and associated data at the destination site are incrementally updated to reflect the most recent changes. Dumping data to a tape or any other media, however, does not support incremental updates that are infinite (e.g., updates that have no limitations on a number). At some point, all data on a source storage system need to be recopied over to a media. In addition, to ensure that the file system and associated data at the destination site are consistent with the source, access to the file system has to be temporarily halted during the process of copying data. Furthermore, the copy of the data has to remain a “read-only” copy so that it cannot be modified by a user.
To address these challenges, new solutions have been advanced that are capable of doing incremental updates infinitely. One common form of updates involves the use of a “snapshot” process in which the active file system (e.g., a file system to which data can be both written and read) at the storage site is captured and the “snapshot” is transmitted as a whole, over a network to the remote storage site. A snapshot is a persistent point in time (PPT) image of the active file system that enables quick recovery of data after data has been corrupted, lost, or altered. Snapshots can be created by copying the data at each predetermined point in time to form a consistent image, or virtually, by using a pointer to form the image of the data. One of the products that establishes and maintains mirror relationship between a source system and a destination system and provides infinite updates to the destination storage system using snapshots is SnapMirror®, a product provided by Network Appliance, Inc., Sunnyvale, Calif. The copy is updated at regular intervals, typically set by an administrator, by sending incremental changes in the most current snapshot since a previously copied snapshot, in an effort to capture the most recent changes to the file system.
There are two known types of mirrors—a physical mirror and a logical mirror. As used herein, the term “mirror” refers to a copy or a replica of a dataset. In a physical mirror, a copy created at the destination storage system has a physical layout that matches the physical layout of the dataset stored at the source storage system (i.e., a destination dataset is stored at data blocks that have the same physical addresses as data blocks that store a source dataset). In a logical mirror, in contrast, a destination dataset can be stored at the destination storage system at different physical locations than the source dataset.
Currently, if a mirror of the data at the source storage system has already been created at a destination site using one technique for mirroring data, and the updates to the mirror at the destination storage system are performed using another technique for mirroring data, an existing mirror at the destination storage system has to be discarded and a new mirror needs to be created. This procedure presents several disadvantages. First, it incurs extra storage space since a new copy of the data at the destination storage system needs to be created while the existing copy already exists. In addition, deleting an existing copy presents disruption to a user. Furthermore, to create a new copy at the destination storage system, a large amount of data needs to be transmitted over the network, thereby burdening the network.
Accordingly, what is needed is a mechanism that supports conversion of one mirror relationship to another mirror relationship without burdening the network bandwidth and without presenting disruption to the user.