Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides the following systems for maintaining remote copies of data at a secondary site, Extended Remote Copy (XRC) and Peer-to-Peer Remote Copy (PPRC). These systems provide a method for the continuous mirroring of data to a remote site to failover to during a failure at the primary site from which the data is being continuously mirrored. Such data mirroring systems can also provide an additional remote copy for non-recovery purposes, such as local access at a remote site. The IBM XRC and PPRC systems are described in the IBM publication “DFSMS/MVS Version 1 Remote Copy Administrator's Guide and Reference; Document Number SC35-0169-03© Copyright IBM Corp. 1994, 1997), which publication is incorporated herein by reference in its entirety.
In such backup systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Typically, the primary volume of the pair will be maintained in a primary direct access storage device (DASD) and the secondary volume of the pair is maintained in a secondary DASD shadowing the data on the primary DASD. A primary storage controller may be provided to control access to the primary DASD and a secondary storage controller may be provided to control access to the secondary DASD. In the IBM XRC environment, the application system writing data to the primary volumes includes a sysplex timer which provides a time-of-day (TOD) value as a time stamp to data writes. The host system time stamps data sets when writing such data sets to volumes in the primary DASD. The integrity of data updates is related to insuring that updates are done at the secondary volumes in the volume pair in the same order as they were done on the primary volume. In XRC and other prior art systems, the cross systems common time stamp provided by the system on behalf of the application program determines and maintains the logical sequence of data updates across any number of data volumes on any number of storage systems. In many application programs, such as database systems, certain writes cannot occur unless a previous write occurred; otherwise the data integrity would be jeopardized. Such a data write whose integrity is dependent on the occurrence of a previous data writes is known as a dependent write. For instance, if a customer opens an account, deposits $400, and then withdraws $300, the withdrawal update to the system is dependent on the occurrence of the other writes, the opening of the account and the deposit. When such dependent transactions are copied from the primary volumes to secondary volumes, the transaction order must be maintained to maintain the integrity of the dependent write operation.
Volumes in the primary and secondary DASDs are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. In the banking example, this means that the deposit is written to the secondary volume before the withdrawal. A consistency group is a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. For instance, in the banking example, this means that the withdrawal transaction is in the same consistency group as the deposit or in a later group; the withdrawal cannot be in an earlier consistency group. Consistency groups maintain data consistency across volumes and storage devices. For instance, if a failure occurs, the deposit will be written to the secondary volume before the withdrawal. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent.
A consistency time is a time the system derives from the application system's time stamp to the data set. A consistency group has a consistency time for all data writes in a consistency group having a time stamp equal or earlier than the consistency time stamp. In the IBM XRC environment, the consistency time is the latest time to which the system guarantees that updates to the secondary volumes are consistent. As long as the application program is writing data to the primary volume, the consistency time increases. However, if update activity ceases, then the consistency time does not change as there are no data sets with time stamps to provide a time reference for further consistency groups. If all the records in the consistency group are written to secondary volumes, then the reported consistency time reflects the latest time stamp of all records in the consistency group. Methods for maintaining the sequential consistency of data writes and forming consistency groups to maintain sequential consistency in the transfer of data between a primary DASD and secondary DASD are described in U.S. Pat. Nos. 5,615,329 and 5,504,861, which are assigned to IBM, the assignee of the subject patent application, and which are incorporated herein by reference in their entirety.
Details of creating and operating data structures in the formation of consistency groups are described in the copending and commonly assigned patent application entitled “METHOD, SYSTEM, AND PROGRAM FOR FORMING A CONSISTENCY GROUP”, having Ser. No. 10/676,852, filed Sep. 29, 2003, which patent application is incorporated herein by reference in its entirety. One data structure, an out of synch bitmap, may be used to indicate tracks to be transferred. A storage controller may receive a consistency group formation command to copy consistent data on specified volumes managed by the storage controller to a remote site. In response, the storage controller may queue any further writes while generating a change recording bitmap to keep track of queued writes as well as any subsequent writes after formation of the consistency group is initiated. This mode of operation may be referred to as a “Consistency Group in Progress Mode.” In this mode, tracks indicated in the out-of-synch bitmap may be copied to the remote site to create a consistency group.
After the out of synch bitmap is drained such that all the asynchronous remote copy operations indicated in the out of synch bitmap have been completed, a consistency group may have been formed. If so, the mode of operation may switch to a second mode in which subsequent writes may instead be recorded in the out of synch bitmap. Tracks indicated in the out-of-synch bitmap may continue to be copied to the remote site. This mode of operation may be referred to as a “Normal Transfer Mode,” for example. In this mode, the change recording bitmap may be merged with the out of synch bitmap, and the change recording bitmap may be discarded. Further, a virtual copy may be performed at the remote site of the volumes consistent as of the time of the storage controller receiving the point-in-time copy command.
In general, all of the bits in the out of synch bitmap are cleared in the Consistency Group in Progress mode before a consistency group is successfully formed. One approach to managing the data transfer in consistency group formation is to impose a fixed time limit on the creation of the consistency group. If the out of synch bitmap is not completely drained before the expiration of the period of time, then the consistency group formation attempt is deemed failed. If so, the mode of operation may be switched from the Consistency Group in Progress Mode to the Normal Transfer Mode. In preparation for the mode switch, the bits of the change recording bitmap may be merged with the out of synch bitmap and any new host writes are recorded in the out of synch bitmap by setting appropriate bits of the out of synch bitmap. As a consequence, tracks from all volumes may be transferred to remote sites as the out of synch bitmap continues to be drained. Thus, a backlog of writes for the next consistency group formation attempt can be reduced or eliminated in some applications.
If the out of synch bitmap is not completely drained after another time limit, another attempt may be made to form a consistency group by switching back to the Consistency Group in Progress Mode. This time limit on duration of the Normal Transfer mode may be dynamically calculated as conditions change as described in copending application Ser. No. 10/987,570, filed Nov. 12, 2004, entitled “DATA TRANSFER MANAGEMENT IN CONSISTENCY GROUP FORMATION”.
Having switched back to the Consistency Group in Progress Mode, the storage controller may queue any subsequent writes while generating the change recording bitmap. After generating the change recording bitmap, any queued writes and subsequent writes may be indicated in the change recording bitmap, and tracks indicated in the out-of-synch bitmap may continue to be copied to the remote site. Again, if the out of synch bitmap is not fully drained by the expiration of the associated time period, the consistency group formation may be deemed a failure and the mode of operation may be switched back to the Normal Transfer Mode. After a certain number consistency group attempts are failed (such as five consistency group attempts, for example) due to the draining of the out of synch bitmap exceeding the associated time limit for formation of each consistency group, the time limit may be ignored. As a consequence, the operation mode may remain in the Consistency Group in Progress mode until a consistency group is successfully formed.
To dynamically calculate the time limit for remaining in the Normal Transfer Mode, the number of tracks remaining to be transferred for a particular node such as a server toward creation of the consistency group as indicated by the out of synch bitmap, may be queried a first time and again at a second, subsequent time as the controller leaves the Consistency Group in Progress Mode. In this manner, the rate of change of the number of tracks remaining to be transferred for the particular server may be determined as a function of the change in the number of tracks determined at the first and second times and dividing by the duration of time between the first and second times. An estimated transfer or drain time for each server may be determined by dividing the number of tracks remaining to be transferred for the particular server toward creation of the consistency group as indicated by the out of synch bitmap, by the data transfer rate determined for the particular server. The longest estimated transfer time of the estimated transfer times calculated for each source server may be selected as a transfer time. In addition, the selected transfer time may be multiplied by a constant to provide a dynamically calculated time limit for the Normal Transfer Mode before returning to the Consistency Group in Progress Mode to attempt formation of another consistency group.