Despite the high reliability of modern day storage devices, it is still critically important for enterprises to put into place systems for automatically backing up important data. Data backup systems allow for quick data recovery in the event that a primary storage device fails. Accordingly, such systems are an important part of any disaster recovery plan. Furthermore, in many industries, such as the financial services and healthcare industries, some enterprises are compelled by strict records-retention regulations to archive important data, including: emails, documents, patient records, audit information, as well as other types of data. In some cases, regulations require that the archived data be stored on WORM (write-once, read-many) media, and be readily available upon request.
To ensure the availability of the data required by records-retention regulations, enterprises utilize asynchronous data mirroring. Generally, an asynchronous data mirroring system is a data storage system having two storage devices, for example, a primary storage device and a backup storage device, sometimes referred to as a target storage device. The asynchronous data mirroring system is configured so that periodically, the backup storage device is synchronized with the primary storage device. That is, the new data from the primary storage device is periodically copied to the backup storage device. Consequently, at any time between synchronization operations, the data on the two storage devices may differ.
One of the problems with asynchronous data mirroring is re-synchronizing a primary storage device and secondary storage device after the mirror has been broken (e.g., the devices have become unsynchronized). For example, as is often the case when a primary storage device fails, the secondary storage device is reconfigured to take the place of the former primary storage device until the problem with the primary storage device can be repaired. However, once the primary storage device has been repaired, it is often difficult, if not impossible, to re-synchronize the data on the two devices.
FIGS. 1 and 2 illustrate an example of this type of problem. FIG. 1 illustrates a timing diagram 10 for an asynchronous data mirroring system. In FIG. 1, the horizontal line at the bottom of the figure with reference number 12 represents time, and the horizontal line with reference number 14 represents the state of a primary storage device over time. For example, moving from left to right on line 14 represents the passing of time, during which application data may be written to a volume on the primary storage device. Similarly, the horizontal line with reference number 16 represents the state, over time, of a backup storage device. In this example, the backup storage device is configured to asynchronously mirror the data of the volume on the primary storage device.
The storage system that is the subject of the timing diagram illustrated in FIG. 1 is configured so that, periodically, a “snapshot” of the volume on the primary storage device is captured and written to a snapshot backup volume on the primary storage device. A snapshot is an incremental backup image in which only data (e.g., disk blocks) that has changed since a previous backup image is captured. Each snapshot operation 18 is represented in FIG. 1 by a circular dot along the horizontal line 14 and 16. Similarly, on a periodic basis, data from the volume on the primary storage device is asynchronously copied, or “mirrored”, to a volume on the backup storage device. This mirror synchronization operation 20 is represented in FIG. 1 by the vertical arrows beginning at reference line 14 and pointing to reference line 16.
Referring now to the time line 12, between time A and time B, twelve snapshots 18 of the volume on the primary storage device are written to the snapshot backup volume on the primary storage device, and three mirror synchronization operations 20 are performed. Consequently, at time B, the mirror volume on the backup storage device is perfectly synchronized with the volume on the primary storage device. At time C, a snapshot of the primary storage device is captured and written to the snapshot backup volume on the primary storage device. Shortly thereafter, at time D, the primary storage device fails. Therefore, at time D, the primary storage device is out of sync with the backup storage device, because the primary storage device contains data that was written after the last successful mirror synchronization operation at time B.
If, as is often the case, the backup storage device is reconfigured to take the place of the primary storage device, then, beginning at time D, application data will be written to the backup storage device. In addition, at time E, a snapshot of the backup storage device may be written to a snapshot backup volume on the backup storage device. Consequently, at time F, when the primary storage device is repaired, the primary storage device and the backup storage device will be out of sync, as each will include data that is unique.
FIG. 2 illustrates a directory tree 22 for a volume on the primary 24 and backup storage devices 26 after the asynchronous data mirror has been broken. As illustrated in FIG. 2, the volume on the primary storage device 24 is out of sync with the volume on the backup storage device 26. The volume on the primary storage device 24 includes the files with filenames FOO_CRITICAL and FOO_BAR, while the volume on the backup storage device 26 does not. These two files were created after the final successful replication synchronization. Similarly, the volume on the backup storage device 26 includes the file with the filename FILE_CRITICAL, while the primary storage device 24 does not. This file, FILE_CRITICAL, was created after the mirror had broken.
There are multiple approaches to re-synchronizing broken mirrors. One solution for re-synchronizing the two devices is to have one device overwrite the other. That way, both devices contain the same set of data, and changes can continue to be mirrored over. However, this results in lost data because the unique data on one system is eradicated in the process. For example, if the backup storage device 26 is selected to overwrite the primary backup storage device 24, then the files on the primary storage device 24 with the filenames FOO_CRITICAL and FOO_BAR will be deleted. Similarly, if the primary storage device 24 is selected to overwrite the backup storage device 26, then the file on the backup storage device 26 with the filename FILE_CRITICAL will be deleted. Data loss is never a good thing, but in some cases, losing data is simply not an option.
Another solution is to require users to manually resolve the differences between the two systems. Unfortunately, this is frequently not possible. If one file has been modified on both systems, the user may not be able to create an authoritative combination of the data sets. There may simply be two conflicting, unique versions of a file that both need to be saved to avoid data loss.