1. Field of Invention
Embodiments of the present invention relate to techniques for performing replication of a data set from a first location to a second location using snapshots to track and store information relating to changes made to the data set.
2. Discussion of Related Art
Copying data from a first location (e.g., including one or more data volumes) to a second is a common task in data storage systems. This may be done for a myriad of reasons, including replication and backup/versioning. In a replication operation, a data set may be copied from the first location to the second to ensure that the second is a mirror of the first and that each stores a copy of the data set such that if there is a failure that results in the data set being in accessible from the first location, the second is available for access. In a backup/versioning operation, a “copy on write” technique can be employed such that changes to the data set made after the point in time result in copies of the original data that was stored in the first data volume at the point in time being copied to a save volume—a data volume acting as, for example, a backup location—before being overwritten in the first volume. In this way, the data set can be “rolled back” to the point in time.
One illustrative technique for forming a point in time copy of a data set is referred to as a snapshot and is described in detail in U.S. Pat. No. 6,792,518 to Armangau et al., which is incorporated herein by reference in its entirety.
A snapshot does not replicate a full copy of the data set (referred to as a production data set). Rather, the snapshot only stores differences between a current version of the production data set and the version of the data set at the point in time when the snapshot was taken. In the implementation described in the '518 patent, the snapshot maintains several data structures, including a block map. When a snapshot is created at time T=0, these data structures, including the block map, may be empty, and they are populated when the data set is written to after the creation of the snapshot. For example, when contents of a first data block in the production data set are about to be changed as a result of a data write operation conducted after time T=0 (e.g., time T=0.5), the original contents of the data block are copied to a save volume such that a copy of a state of the data block at the time the snapshot was created (i.e., the contents of the data block at time T=0) is maintained. An entry is then placed into the block map linking the data block in the save volume to its corresponding position in the point in time data set that the snapshot represents. This can be repeated over time, for each change made to the production data set after the snapshot was created, such that the block map contains an entry for each changed data block.
The block map of the snapshot can then be used at a later time (e.g., time T=10) to determine the state of production first data set at the time the snapshot was created (time T=0) even if it has changed since T=0. To do so, a read operation to the snapshot for a selected data block will access the block map to determine if the block map contains an entry for that selected data block. If so, it can be determined that the selected data block changed after the snapshot was created and that the data stored in the production data set is not the data that was stored in the selected data block at time T=0. The information stored in the entry in block map will then be accessed to determine the location of the corresponding data and will read the data from the save volume that is the data that was stored in the selected data block in the first data volume at time T=0. If, however, there is no entry in the block map for the selected data block, then it can be determined that the data did not change after the creation of the snapshot, and that the data stored in the production data set is the data that was stored at time T=0. Accordingly, the data can be read from the production data set.
Multiple snapshots can also be created at different times, and can work together in a serial fashion so that only the most recently created snapshot directly tracks changes to the production data set. For example, if a data block was overwritten after time T=0 but also after time T=1, when a second snapshot was created, the snapshot at time T=0 may not reflect that the selected data block was changed, but the snapshot created at time T=−1 will. The snapshot created at time T=1 may have its own block map containing addresses of data blocks on the save volume storing the contents of data blocks overwritten after time T=1. In response to a read operation, carried out at time subsequent to T=1 to the snapshot at time T=0, it may be determined from the snapshot at time T=1 that the selected data block in the production volume was overwritten subsequent to T=0, so that the data block that existed at T=0 can be retrieved (using the block map for snapshot T=1), from the save volume.
As should be appreciated from the foregoing, snapshots can be used to determine previous states of a data set at past times without needing to make a full copy of the data set at those past times. Instead, only the “deltas” or differences are stored in snapshots.