Data replication is a technique used to maintain copies of data at separate locations. For example, data can be replicated on several different sites within a corporation's campus and/or on several different ones of the corporation's campuses. If the data is replicated at different sites, and if the failure of the systems storing the data at one site is unlikely to cause the failure of the corresponding systems at another site, replication can provide increased data reliability. Thus, if a disaster occurs at one site, an application that uses that data can be restarted using a replicated copy of the data at another site.
Replication can be performed on data volumes by designating one volume as the primary volume. One or more secondary volumes are then synchronized with the primary volume. These secondary volumes can each be located at a different secondary site. Initially, a secondary volume can be synchronized to the primary volume by copying all of the data on the primary volume to the secondary volume. The data on the primary volume can be copied to the secondary volume by transferring all the data over the network, by creating a backup of the primary volume and restoring the secondary volume from the backup, or by attaching one or more mirrors of the primary volume to the secondary volume. Replication then continues by propagating any changes to data in the primary volume to the second volumes. For example, synchronous data replication can be performed by preventing the completion of an application-initiated write to the primary volume until the write has been applied to the primary volume and to all of the secondary volumes.
During replication, it is often critically important to maintain consistency between the primary volume and the secondary volume. Consistency ensures that, even if the secondary volume is not identical to the first volume (e.g., updates to the secondary volume may lag behind updates to the primary volume), the secondary volume always represents a state of the primary volume that actually existed at a previous point in time. For example, if an application performs a sequence of writes A, B, and C to the primary volume, consistency can be maintained by performing these writes to the secondary volume in the same sequence. At no point should the secondary volume reflect a state, such as would have occurred if write C was performed before write B, that never actually occurred on the primary volume.
Another technique, in addition to replication, that may be used to increase the reliability and/or accessibility of data involves creating point-in-time copies of a data volume. These point-in-time copies protect the data on the data volume against logical or physical damage. Examples of point-in-time copies include snapshots (like copy-on-write snapshots and mirror-breakoff snapshots) and backups. Each of these point-in-time copies allows the volume to be restored to its state at an earlier point-in-time. For example, if the volume is corrupted at 8 PM, the volume can be restored from a point-in-time copy of the volume that was created at 7 PM on the same day.
If a primary volume that is being replicated is restored from a previously created point-in-time copy of itself, the secondary volume becomes inconsistent with respect to the primary volume. For example, the software that controls replication may not detect that the primary volume is being restored from a point-in-time copy, and thus changes to the primary volume due to the restore may not be replicated to the secondary volume. Such inconsistencies can remain until the secondary volume is again fully resynchronized with the primary volume. During the time in which the secondary volume is inconsistent with the primary volume, the data stored by the secondary volume is not useable to restart an application that uses that data to a known stable state. Thus, if the primary volume fails during this time or if there is a disaster at the primary site, there may be no way to use the secondary volume to restart the application since the secondary volume is not guaranteed to be consistent with any known state of the primary volume. As this example shows, it is desirable to be able to maintain consistency between the primary volume and any secondary volumes, even if the primary volume is restored while replication is ongoing.