1. Field of the Invention
The present invention relates to asynchronous data mirroring systems. More particularly, one aspect of the invention concerns a method of temporarily caching data for storage in a primary storage subsystem for asynchronous mirroring at a secondary storage subsystem without sacrificing timestamp information.
2. Description of Related Art
For many businesses, governments, and other computer users that update and store data at a primary site, it is essential to maintain a backup copy of the data at a secondary site that is physically remote from the primary site. This permits recovering data from the secondary site in the event of an equipment failure or other disaster, for example a fire or explosion, that damages or destroys data at the primary site. Copying data to a remote secondary site as a backup for disaster recovery is referred to as data shadowing, data mirroring, data duplexing, or remote copying. In order to be able to accurately restore a database after a disaster, it is important to maintain the data, which can include database log data and database table data, in an order at the secondary site, which is sequentially consistent with the order of the data at the primary site. This is referred to as maintaining data consistency. Also, it is generally desirable to minimize any performance degradation at the primary site resulting from employing data shadowing.
The two main categories of remote data shadowing are referred to as “synchronous” and “asynchronous” data shadowing. With synchronous data shadowing, any given data update is stored at both the primary and secondary sites before permitting the next update to be written to storage at the primary site. Thus, the data at the secondary site is updated synchronously with data at the primary site. A data update can include new or updated data object such as a record, record set, file, linked list, or any other data structure.
The International Business Machines (IBM) Peer-to-Peer Remote Copy (PPRC) facility is an example of a synchronous remote data shadowing system With PPRC, data at the remote secondary site is updated in the same sequence as data at the primary site, and consequently data at the secondary site is inherently synchronized with data at the primary site. When an application running on a host at the primary site writes a data update to a volume on a Direct Access Storage Device (DASD) at the primary site, a storage controller at the primary site stores the update in the DASD at the primary site. The storage controller at the primary site also forwards the update to a secondary storage controller at the secondary site for storage on a DASD volume at the secondary site. Next, the secondary storage controller notifies the primary storage controller that the secondary storage controller has received the update, and then the primary storage controller notifies the primary host that the update has been completed. Consequently, after a data update, processing the next transaction or input/output (I/O) is delayed, because the primary storage controller does not notify the primary host that the update is complete until the primary storage controller receives confirmation from the secondary storage controller that the secondary storage controller has received the update. This delay becomes larger as the distance between the primary and secondary sites is increased, and can effectively limit the maximum distance between the primary and secondary sites to about 40 kilometers.
In contrast to synchronous data shadowing, with asynchronous data shadowing more than one data update can be written to storage at the primary site before any data updates are sent to the secondary storage site. Thus, with asynchronous data shadowing, the data updates at the secondary site generally occur asynchronously in relation to the updates at the primary site.
The IBM Extended Remote Copy (XRC) facility is an example of an asynchronous remote data shadowing system. With the XRC facility, when an application running on a host at the primary site sends a request to a storage controller at the primary site to store a data update on a volume on a DASD at the primary site, the storage controller stores the update in the DASD, and also stores the data update in a sidefile in the storage controller at the primary site A storage controller can have one or more sidefiles, and each sidefile corresponds with a “controller session”. Each data update is stored with a timestamp that identifies the time that the host application made the request to store the data update. The timestamps permit placing the data updates in the sidefile, and other data updates in other sidefiles, in sequence consistent order when they are stored at the secondary site. XRC employs a “system data mover” server, which runs a program that gathers a group of data updates from sidefiles in one or more storage controllers at the primary site. Using the timestamps, the data mover places the updates into sequence consistent groups referred to as “consistency groups”. The group of storage controller sessions corresponding to the sidefiles from which XRC gathers data to form consistency groups are referred to as an XRC “session.”
Consistency groups are groups of data updates that are grouped together because they occurred during the same time interval. To facilitate the formation of consistency groups, the latest timestamp for each controller session is saved in the storage controller. When forming consistency groups, XRC ignores the timestamps from controller sessions that have not had any data updates for a prescribed amount of time, for example one second, which is referred to as “idle” controller sessions. The data mover repeatedly forms, and then transmits, consistency groups of data updates to a storage controller at the secondary site. The storage controller at the secondary site receives data updates in each consistency group and stores them in sequence consistent order in volume(s) on DASD(s) at the secondary site. As a result of using consistency groups, the DASD(s) at the secondary site are updated in the same sequence as the DASD(s) at the primary site. Forming consistency groups, and other relevant information, is described in U.S. Pat. No. 5,734,818, issued Mar. 31, 1998, titled “Forming Consistency Groups Using Self-describing Data Objects For Remote Data Duplexing”, and in U.S. Pat. No. 6,301,643, issued Oct. 9, 2001, titled “Multi-environment Data Consistency”, the entirety of which are incorporated herein by reference.
With XRC, the primary storage controller notifies the host that a data update has been completed soon after the primary storage controller receives the update from the host, without waiting for any communication from the secondary site. Accordingly, the primary host is free to process the next transaction or I/O, very soon after sending a data update to the primary storage controller.
Consequently, an asynchronous remote data shadowing system, such as XRC, can provide better performance than a synchronous remote data shadowing system, such as PPRC. Although systems such as XRC provide significant utility and also enjoy widespread commercial success today, IBM engineers are nonetheless seeking to improve the performance and efficiency of such remote data shadowing systems. In this regard, advances in speed and efficiency of data shadowing are continually sought.