The present invention relates generally to techniques for storage replication. More particularly the present invention relates to a method and apparatus for re-synchronizing a remote mirroring pair and maintaining data consistency between the volumes of the remote mirroring pair.
Conventionally, there have been two types of approaches to storage-based volume replication, namely local replication and remote (copy) replication. Both technologies mirror files, file systems, or volumes without using host CPU power. When a host conducts a host input/output (I/O) such as a write I/O of data to a primary volume (PV) of a storage system, the storage system automatically copies the data to a replication (secondary) volume (SV). This mechanism ensures that PV and SV are identical.
Local replication duplicates the primary volume within a first storage system, so that when the host writes data to the PV, the first storage system also stores the data to a local secondary volume (LSV). Local replication is typically used for taking backups.
Remote replication duplicates volumes across two or more storage systems so that when the host writes data to PV, the first storage system transfers the data through paths, such as ESCON, Fibre Channel, T3, and/or IP networks, to at least one second storage system for storage in a remote secondary volume (RSV) included therein. Remote replication is typically used to enable the recovery of data from disasters, such as earthquake, flood, fire, and the like. Even if the first storage system or the whole data center at the primary site is damaged by a disaster, data is unaffected at the secondary site and business can be resumed quickly.
There are at least two modes of transferring data to implement remote mirroring between local and remote storage systems, namely synchronous mode and asynchronous mode. In the synchronous mode, all write I/O's to the PV of the first storage system are mirrored at the RSV of the second storage system. In the asynchronous mode, in response to a write I/O, the first storage system completes the write I/O and then asynchronously transfers the write data to the second storage system for storage on the RSV. Specifically, the write data to be copied to the RSV of the second storage system is temporarily stored in a queuing area, such as cache memory, disk storage, Non-Volatile Random Access Memory (NVRAM) etc. The write data is retrieved from the queuing area and then stored in the RSV of the second storage system.
Recently Volume Replication Technology has become very popular. Volume Replication gives users many benefits to manage their data stored on volumes. However, volume replication as per the conventional technique includes complicated operations in a system combining local and remote replication when it is necessary to restore data onto the primary volume and the storage systems are in the synchronous mode. For example, when a system is configured to have local and remote secondary volumes (LSV and RSV) for one primary volume (PV) as shown in FIGS. 2A–E, it is necessary to suspend and resynchronize the remote replication pair (PV and RSV) before and after restoring data from the local secondary (replica) volume LSV onto the PV.
FIG. 2A illustrates the normal state where volume replication is implemented according to the conventional technique. As per FIG. 2A PV and RSV are in the synchronous mode and PV and LSV are in the suspended state so that data written by the host to PV in a write I/O is eventually copied to LSV. Further, as illustrated in FIG. 2A, both PV and LSV contain a bit map which is an image of the state of the data stored in the respective volumes immediately before suspension is implemented. These bit maps manage the differences between the volumes.
Once an event has occurred where the PV must be restored from the LSV the synchronization between the replication pair PV and RSV must be suspended as illustrated in FIG. 2B. Thereafter, the bit maps of the respective volumes PV and LSV are merged, PV and LSV are changed from the suspended state to the synchronous mode, and data from the LSV is stored to the PV as illustrated in the FIG. 2C. In addition, it may be necessary for the write I/O's from the host to be halted.
The bit maps for each of the volumes PV and LSV are stored in the respective volumes and the replication pair PV and LSV are changed from the synchronous mode to the suspended state as illustrated in FIG. 2D with the bit maps stored in the respective volumes prior to them being placed in the suspended state. Finally, the replication pair PV and RSV are then moved from the suspended state to the synchronous mode as illustrated in FIG. 2E.
One of the disadvantages of the above described conventional technique is that there may be inconsistencies between the data stored on the respective remote replication pairs PV and RSV, thereby creating a “fuzzy” status being that updates are only made according to differences in the bit map managed by the local replication pairs PV and LSV.
Therefore there is a need to provide a technique to manage and operate the recovery process in the above described cascading replication configuration including local and remote replication to improve data consistency. Further, there is a need to provide a technique that provides time consistent volume replication even during recovery process.