NOT APPLICABLE
NOT APPLICABLE
NOT APPLICABLE
The present invention relates generally to data processing storage systems comprising a local or local storage facility and two or more remote storage facilities that mirror at least certain of the data retained by the local storage facility. More particularly, the invention relates to a method, and apparatus implementing that method, to synchronize the data at surviving storage facilities in the event of failure of one of them.
The use of data processing over the years by commercial, military, governmental and other endeavors has resulted in tremendous amounts of data being storedxe2x80x94much of it virtually priceless because of its importance. Businesses, for example, risk collapse should its data be lost. For this reason alone the local data is backed up to one or more copies of the data, and retained for use should the original data be corrupted or lost. The more important the data, the more elaborate the methods of backup. For example, one approach to protecting sensitive or valuable data is to store backup copies of that data at one or more sites that are geographically remote from the local storage facility. Each remote storage facility maintains a mirror image of the data held by the local storage facility, and changes (e.g., writes, deletions, etc.) to the local data image of the local storage facility are transferred and also effected at each of the remote storage facilities so that the mirroring of the local data image is maintained. An example of a remote storage system for mirroring data at a local storage system is shown by U.S. Pat. No. 5,933,653.
Updates sent to the remote storage facilities are often queued and sent as a group to keep the overhead of remote copying operations at a minimum. Also, the transmission medium often used is an Internet connection or similar. For these reasons, the data images mirroring the local data will, at times not be the same. If more than one remote storage is used to mirror the local data, there often will be times when the data images of the remote storages will be different from one anotherxe2x80x94at least until updated by the local storage facility. These interludes of different data images can be a problem if the local facility fails, leaving only the remote storage facilities. Failure of the local storage facility can leave some remote storage facilities with data images that more closely if not exactly mirror that of the local storage facility before failure, while others have older xe2x80x9cstalexe2x80x9d data images that were never completely updated by the last update operation. Thus, failure of the local storage facility may require the remote storage facilities to re-synchronize the data between them in order that all have the same and latest data image before restarting the system. There are several approaches to data synchronization.
If removable media (e.g., tape, CD-R, DVD, etc.) is used at the local and remote storage facilities, such removable media can be used. For example, a system administrator will copy data from a selected remote storage facility (the image-donating facility) that is believed to have the most up-to-date data image of the local facility to the tape. Then, in order to keep the data image from changing before it is used to synchronize at the other remote storage facilities, input/output (I/O) operations at the image-donating facility are halted until the tape can be circulated to update the other remote storage facilities. At the remote storage, an administrator copies data from removable media to storage at the remote site. Then, the system administrator re-configures the entire system to that of the selected remote storage facility which now becomes the new local storage facility, and its I/O operations allowed be commence. This approach is efficient when the data involved is small, but not so for larger systems. Larger systems will produce data that grows rapidly, requiring what could be an inordinate amount of time to copy for the entire synchronization process.
Lacking removable media, another approach would be to use any network connections between the various storage facilities to communicate data. This approach requires that one storage facility be selected to replace the former local (but now failed) storage facility. I/O operations at the selected storage facility is halted, for the same reasons stated above, and a re-synchronize copy process is initiated between the selected storage facility and the other remote storage facilities. When the re-synchronization process is complete, I/O operations are restarted at the selected storage facility, and the sytem proceeds as before, albeit with one less storage facility (the failed former local storage facility).
A major problem with this latter approach is the time needed for the re-synchronization process, particularly for larger amounts of data. For example, a storage of 100 terabytes (TB) of data, using 100MB/s network transfer connection, will take approximately 11.57 days to transfer all the data; (100xc3x971012/(100xc3x97106)=10 sec =277 hours =11.57 days). This is the time for re-synchronization of just one storage facility. If re-synchronize is to be performed for more than one storage facility, the problem is exacerbated. Also, during the re-synchronization process, I/O operations of the storage facilities involved are halted.
The present invention provides a method, and architecture for implementing that method, of synchronizing two or more remote or remote data storage facilities so that they hold and maintain the same data images in the event of a failure of the local storage.
Broadly, the invention pertains to a data processing system comprising a local (local) data storage facility communicatively coupled to (i.e. in communication with) two or more remote or remote storage facilities. Each of the remote storage facilities, whether local or remote, includes storage media data storage. Data maintained on the storage media at the local data storage facility is mirrored on storage media at the remote storage facilities. Changes to the data image of the local storage facility are periodically sent to the remote storage facilities for updating their date images using a remote copy process that sends data messages with the data updates. Each of the storage facilities keeps information that is indicative of the history of what updates have been received by the remote storage facilities and what updates have been received and implemented (by writes to the storage medial of such remote storage facility). In the event of failure of a storage facility, the surviving storage facilities circulate the historical update to determine any differences, if any, of the data images, i.e., have there been updates not received by any of the surviving storage facilities. If so, the surviving storage facilities will synchronize their data images so that all have a substantially identical data image.
According to one embodiment of the invention, synchronization is achieved by a xe2x80x9croll-forwardxe2x80x9d operation in which that remote storage facility having the latest updates, as indicated by the historical update information, sends those needed updates to the other remote storage facilities for bring up to date all data images. In another xe2x80x9croll-backxe2x80x9d operation of synchronization, updates are discarded to bring all data images back to the same level.
Advantages of the invention include the fact that in data processing systems having storages that are mirrored, the mirrored images of the local storage will correspond to one another in the event of a failure of the local storage they mirror.
In another embodiment of the invention queue structures are maintained by each of the storage facilities, identifying, in a roll back queue, messages not yet written to storage media, and in a write history queue messages that have been written. If the local storage facility fails, the remote storage facilities circulate among themselves information describing the content of the roll back and write history queues in order to allow them to determine which storage facility contains data not held by the other storage facilities, and acting upon that information by sending data.
A further embodiment of the invention involves an alignment procedure in which the local storage keep track of what messages have been received by the remote storages, and from that information determines what messages are in the respective roll back and write history queues of each. In order to maintain an overlap of messages in the roll back and write history queues, for later roll back, roll forward, or purge operations, the local storage facility will withhold transmission of data.
These and other features and advantages of the present invention may be obtained from a reading of the following detailed description, which should be taken in conjunction with the accompanying drawings.