Remote copying of data is an integral part of disaster recovery for protecting critical data from loss and providing continuous data availability. In a disaster recovery support system, data write updates to a primary or central data store are reproduced at a secondary, remote site. The remote site is typically located at a distance from the primary data store if protection from natural disasters is a concern, but may be adjacent to the primary site if equipment failure is the main concern. In the event of a failure at the primary data store, the remote site can take over all operations, including data write updates, with confidence that no data has been lost. Later, after repair, the primary data store can be restored to the condition of the remote site and can resume all operations, including data write operations.
During remote copying, typically same-sized blocks of data are sent from the primary data store to the remote data site. In this way, data write updates at the primary data store are reproduced at the remote site so as to permit reconstruction of the data, including reconstruction of the exact sequence of data write updates that took place at the primary data store. This reproducibility can be especially important, for example, in a banking system or other transaction log system. Thus, data write updates at the primary data store are collected and are periodically sent to the remote site in a remote copy operation.
The various types of remote copy can require enormous amounts of bandwidth over the data lines between the primary data store and the remote site controller. For example, if a primary data store controller can support 20,000 input/output (I/O) operations per second, and if 50% of these operations are write operations, then the controller can handle 10,000 write operations per second. If each write update involves 4 K bytes, then bandwidth of 40 MB per second is required between the primary controller and the remote site controller. This is a significant amount of bandwidth to provide, given currently available pricing for data lines. Even though asynchronous remote copy can speed up write updates, it does not decrease the amount of bandwidth required.
One proposed system which addresses the issue of bandwidth usage is presented in U.S. Pat. No. 6,327,671 entitled DELTA COMPRESSED ASYNCHRONOUS REMOTE COPY and assigned to the assignee of the present application (“the '671 patent”), which patent is incorporated herein by reference in its entirety. As illustrated in FIG. 1, the system disclosed therein provides a remote copy operation that copies data write updates from a primary data store to a secondary data store by identifying which bytes have changed and sending only the changed bytes from the primary data store to the secondary site. A data operation such as an exclusive-OR (XOR) logic operation can be used to identify the changed bytes. Many data storage systems include XOR facilities as part of their normal configuration, including systems that implement the well-known RAID-type data storage. The XOR operation is used in the '671 patent on the write updated block of data to be copied. Data compression can then be used on the XOR data block to delete the unchanged bytes, and then only the changed bytes are sent to the remote site. This reduces the amount of data being sent between the primary data store and the remote site, and reduces the bandwidth required between the sites. In this way, the remote copy system is said to provide remote copying without requiring a great deal of expensive bandwidth.