Disaster recovery systems typically address two types of failures: a sudden catastrophic failure at a single point in time, or data loss over a period of time. In both types of failure scenario, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides the following systems for maintaining remote copies of data at a secondary site, Extended Remote Copy (XRC) and Peer-to-Peer Remote Copy (PPRC). These systems provide a method for the continuous mirroring of data to a remote site to failover to during a failure at the primary site from which the data is being continuously mirrored. Such data mirroring systems can also provide an additional remote copy for non-recovery purposes such as local access at a remote site. These IBM XRC and PPRC systems are described in the IBM publication “Remote Copy: Administrator's Guide and Reference,” IBM document number SC35-0169-02 (IBM Copyright 1994, 1996), which publication is incorporated herein by reference in its entirety.
In such backup systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Typically, the primary volume of the pair will be maintained in a primary direct access storage device (DASD) and the secondary volume of the pair is maintained in a secondary DASD shadowing data from the primary DASD. A primary storage controller may be provided to control access to the primary DASD and a secondary storage controller may be provided to control access to the secondary DASD. In the IBM XRC environment, the application system writing data to the primary volumes includes a sysplex timer which provides a time-of-day (TOD) value as a time stamp to data writes. The host system time stamps data sets when writing such data sets to volumes in the primary DASD. The integrity of data updates is related to insuring that updates are done at the secondary volumes in the volume pair in the same order as they were done on the primary volume. In XRC and other prior art systems, the cross systems common time stamp provided by the system on behalf of the application program determines and maintains the logical sequence of data updates across any number of data volumes on any number of storage systems. In many application programs, such as database systems, certain writes cannot occur unless a previous write occurred; otherwise the data integrity would be jeopardized. Such a data write whose integrity is dependent on the occurrence of a previous data write is known as a dependent write. For instance, if a customer opens an account, deposits $400.00, and then withdraws $300.00, the withdrawal update to the system is dependent on the occurrence of the other writes, i.e., the opening of the account and the deposit. When such dependent transactions are copied from the primary volumes to the secondary volumes, the transaction order must be maintained to maintain the integrity of the dependent write operation.
Volumes in the primary and secondary DASDs are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. In the banking example, this means that the deposit is written to the secondary volume before the withdrawal. A consistency group is a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. For instance, in the banking example, this means that the withdrawal transaction is in the same consistency group as the deposit or in a later group; the withdrawal cannot be in an earlier consistency group. Consistency groups maintain data consistency across volumes and storage devices. For instance, if a failure occurs, the deposit will be written to the secondary volume before the withdrawal. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent.
A consistency time is a time the system derives from the application system's time stamp to the data set. A consistency group has a consistency time for all data writes in a consistency group having a time stamp equal or earlier than the consistency time stamp. In the IBM XRC environment, the consistency time is the latest time to which the system guarantees that updates to the second volume are consistent. As long as the application program is writing data to the primary volume, the consistency time increases. However, if update activity ceases, then the consistency time does not change as there are no data sets with time stamps to provide a time reference for further consistency groups. If all the records in the consistency group are written to secondary volumes, then the reported consistency time reflects the latest time stamp of all records in the consistency group. Methods for maintaining the sequential consistency of data writes and forming consistency groups to maintain sequential consistency in the transfer of data between a primary DASD and secondary DASD are described in U.S. Pat. Nos. 5,615,329 and 5,504,861, which are assigned to IBM, the assignee of the subject patent application, and which are incorporated herein by reference in their entirety.
One technique to maintain consistency across copies is to time stamp data across primary volumes using a common clock source, referred to as a sysplex timer. Updates will be transferred in groups defined as all updates having a time stamp less than a certain time. When clock synchronization cannot be easily implemented to form consistency groups across systems, then another technique for forming consistency groups is to determine a cut off point. Any updates to primary volumes managed by the primary controller cache dated as of the cut off point are transferred to the secondary controller for storage in the secondary volumes. While transferring the data in the consistency group, the primary storage controller would return busy to any host request while the data in the consistency group is transferred. After the data in the consistency group is transferred and the primary and secondary storage controller are synchronized, i.e., any updates prior to the cut off point are transferred, then the primary controller would cease returning busy to the applications. This ensures that the primary and secondary volumes are consistent as of the freeze cut off point.
As is discussed above and as is fully discussed in the commonly assigned and simultaneously filed U.S. application Ser. No. 10/676,852 entitled “METHOD, SYSTEM, AND PROGRAM FOR FORMING A CONSISTENCY GROUP”, a storage system failure can result from a sudden or catastrophic failure at a single point in time. Such a failure can be particularly disruptive if a storage controller or storage volume at a primary or local site fails since host I/O operations typically write to the storage system at the local site.
Currently known asynchronous data copying solutions have several scenarios where the resumption of normal operations after a failure at the primary or local site requires the customer to perform a full copy of all volumes maintained at a secondary or recovery site. Full volume copies can take many hours depending on the amount of data stored in the respective volumes. Furthermore, full volume copies can leave the customer exposed to subsequent failures until normal operations can be resumed.
In addition, a data storage system configured across multiple storage sites and having multiple storage volumes and controllers may rely on consistency group formation and consistency group processing to maintain data consistency across volumes and storage devices. A need exists in the art for a mechanism to facilitate the maintenance and manipulation of consistency groups across multiple storage controllers when failure strikes a local controller associated with a local or primary site. Proper use of consistency groups can assure that recovery from a local failure will proceed with minimal data loss and without the need for a time consuming full volume copy.
The present invention is directed toward overcoming one or more of the problems discussed above.