The present invention relates generally to data processing storage systems which include a primary (or local) storage facility and two or more secondary (or remote) storage facilities that mirror at least certain of the data retained by the primary storage facility. More particularly, the invention relates to a method, and apparatus implementing that method, to synchronize the data at surviving storage facilities in the event of an interruption in copying data from one storage location to another storage location.
Extensive use of data processing by commercial, governmental and other entities has resulted in tremendous amounts of data being stored—much of it of extreme importance to the day-to-day operation of such entities. For example, enormous numbers of financial transactions are now performed entirely electronically. Businesses such as airline companies risk chaos should data regarding future ticketed reservations be lost. As a result of the need for reliable data, local data is usually backed up, often to a remote location, with one or more copies of the data retained for use should the original data be corrupted or lost. The more important the data, the more elaborate the methods of backup. For example, one approach to protecting sensitive or valuable data is to store backup copies of that data at sites that are geographically remote from the local storage facility. Each remote storage facility maintains a mirror image of the data held by the local storage facility, and revises that stored data to “mirror” changes to the local data image of the local storage facility as it is changed. One example of a remote storage system for mirroring data at a local storage system is described in U.S. Pat. No. 5,933,653, entitled “Method and Apparatus for Mirroring Data in a Remote Data Storage System.”
Updated data sent to the remote storage facilities are often queued and sent as a group over a network transmission medium such as the Internet, to reduce the overhead of remote copying operations. Thus, the data image mirrored at the remote site and that at the local site will not necessarily be the same. If more than one remote storage is used to mirror the local data, there will be situations in which the data images of the remote storages will be different from one another—at least until updated. These interludes of different data images can be a problem if the local facility fails. Failure of the local storage facility can leave some remote storage facilities with data images that more closely, if not exactly, mirror that of the local storage facility before failure, while others have older “stale” data images that were never completely updated by the last update operation. Thus, failure of the local storage facility may require the remote storage facilities to resynchronize the data among them to assure all have the same latest data image before restarting the system.
One problem which also must be addressed is recovery of the system in the circumstance where a “suspension” occurs during a remote copy operation. An interruption by an unexpected incident, for example, a cache overflow, a storage system failure during copying, a network interruption or other intervention in the remote copy operation, requires that a resynchronization be performed. One approach for resynchronizing remote copy operations is described in U.S. Pat. No. 6,092,066 entitled “Method and Apparatus for Independent Operation of a Remote Data Facility.” The technique described in this patent, however, only allows resynchronization in limited circumstances. With certain types of more complex system suspensions, such as a combination of two failures, e.g., a link failure, cache overflow, and/or a drive failure, there is no ready solution to re-synchronizing the system which avoids reinitializing the system. In these types of situations, because this technology does not assure that the configuration will have at least two copies available, a full copy of the volume is usually required for resynchronization.
When the primary site fails due to problems such as a disaster and the primary storage data becomes unavailable, computer systems start their jobs using data in the secondary storage systems. Before the computer systems start using the data in one of the secondary storage systems, the other secondary storage systems need to be synchronized so that the storage systems all have the same data. If the storage systems are not synchronized, the data in the storage systems become inconsistent. When several storage systems are used, each storage system has no way of knowing the copy progress or status of the other storage systems (i.e., what data has been copied). It is virtually impossible to ascertain the differences among storage systems manually. As a result, it may be necessary to copy all data in one storage system which is used for production to all other storage systems in order to synchronize the storage systems, which leads to unnecessarily huge data transaction and long completion time.