1. Field of the Invention
The present invention is related to a system architecture for an arbitrary number of backup components.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, data updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides two systems for maintaining remote copies of data at a secondary storage device, extended remote copy (XRC) and peer-to-peer remote copy (PPRC).
These systems provide a method for recovering data updates between a last, safe backup and a system failure. Such data shadowing systems can also provide an additional remote copy for non-recovery purposes, such as local access at a remote site. The IBM® XRC and PPRC systems are described in IBM publication “Remote Copy: Administrator's Guide and Reference,” IBM document no. SC35-0169-02 (IBM Copyright 1994, 1996),which publication is incorporated herein by reference in its entirety.
In such backup systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes a consistent copy of the data maintained in the primary volume. Typically, the primary volume of the pair will be maintained in a primary direct access storage device (DASD) and the secondary volume of the pair is maintained in a secondary DASD shadowing the data on the primary DASD. A primary storage controller may be provided to control access to the primary DASD and a secondary storage controller may be provided to control access to the secondary DASD.
In the IBM® XRC environment, the application system writing data to the primary volumes includes a sysplex timer which provides a time-of-day (TOD) value as a time stamp to data writes. The application system time stamps data sets when writing such data sets to volumes in the primary DASD. The integrity of data updates is related to ensuring that data updates are done at the secondary volumes in the volume pair in the same order as they were done on the primary volume. In the XRC and other prior art systems, the time stamp provided by the application program determines the logical sequence of data updates. In many application programs, such as database systems, certain writes cannot occur unless a previous write occurred; otherwise the data integrity would be jeopardized. Such a data write whose integrity is dependent on the occurrence of previous data writes is known as a dependent write. For instance, if a customer opens an account, deposits $400, and then withdraws $300, the withdrawal update to the system is dependent on the occurrence of the other writes, the opening of the account and the deposit. When such dependent transactions are copied from the primary volumes to secondary volumes, the transaction order must be maintained to maintain the integrity of the dependent write operation.
Volumes in the primary and secondary DASDs are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. In the banking example, this means that the deposit is written to the secondary volume before the withdrawal. A consistency group is a collection of related volumes that need to be kept in a consistent state. A consistent transaction set is a collection of data updates to the primary volumes such that dependent writes are secured in a consistent manner. For instance, in the banking example, in order to maintain consistency, the withdrawal transaction needs to be in the same consistent transactions set as the deposit or in a later consistent transactions set; the withdrawal cannot be in an earlier consistent transactions set. Consistency groups maintain data consistency across volumes. For instance, if a failure occurs, the deposit will be written to the secondary volume before the withdrawal. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent.
A consistency time is a time the system derives from the application system's time stamp to the data set. A consistency group has a consistency time for all data writes in a consistency group having a time stamp equal or earlier than the consistency time stamp. In the IBM® XRC environment, the consistency time is the latest time to which the system guarantees that data updates to the secondary volumes are consistent. As long as the application program is writing data to the primary volume, the consistency time increases. However, if data update activity ceases, then the consistency time does not change as there are no data sets with time stamps to provide a time reference for further consistency groups. If all the records in the consistency group are written to secondary volumes, then the reported consistency time reflects the latest time stamp of all records in the consistency group. Methods for maintaining the sequential consistency of data writes and forming consistency groups to maintain sequential consistency in the transfer of data between a primary DASD and secondary DASD are described in U.S. Pat. Nos. 5,615,329 and 5,504,861, which are assigned to IBM, the assignee of the subject patent application, and which are incorporated herein by reference in their entirety.
Typically, there is a lag between the time at which a primary storage device is updated and the time at which the secondary storage device is updated. For example, a bank customer may make a payment from a savings account into a loan account. There are two parts to this transaction—withdrawal from the savings account and payment to the loan account. The two parts of the transaction should be done and archived atomically. The order of the two parts should also be maintained (i.e., withdrawal followed by payment) in order to avoid problems. In some cases, the primary storage device may fail while a transaction is being performed. For example, data about the payment from the savings account may be sent to the secondary storage, while the withdrawal data is not sent due to system failure. In this, case, the primary storage device reflects both the withdrawal and payment, while the secondary storage device reflects only the payment. Thus, it is possible that after a disaster and recovery, only one part of the transaction is applied from the secondary storage device to the primary storage device, so that the restored account records reflect the payment in the loan account, but not the withdrawal from the savings account. In this example, the bank will lose money in the amount of the withdrawal from the savings account, which remains in the customer's account. Since the bank loses money, the bank will be unhappy with the disaster recovery.
Additionally, a backup device or computer may receive inbound data writes from the primary storage controller and may send the data writes outbound to the secondary storage controller. In order to have adequate performance in such a system, it is desirable to have a number of backup devices or computers working together to transfer data from the primary storage controller to the secondary storage controller. Additionally, the data transferred should create a consistent copy because the data restored from the secondary storage device needs to be consistent to provide value to a customer. Thus, there is a need in the art for improved transfer of data using multiple backup devices or computers.