1. Field of the Invention
The present invention relates to a system, method, and program for insuring data consistency across groups of storage areas and, in particular, insuring data consistency in a mass storage device comprised of a plurality of storage systems.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides two systems for maintaining remote copies of data at a secondary site, extended remote copy (XRC) and peer-to-peer remote copy (PPRC). These systems provide a method for recovering data updates between a last, safe backup and a system failure. Such data shadowing systems can also provide an additional remote copy for non-recovery purposes, such as local access at a remote site. These IBM XRC and PPRC systems are described in IBM publication xe2x80x9cRemote Copy: Administrator""s Guide and Reference,xe2x80x9d IBM document no. SC35-0169-02 (IBM Copyright 1994, 1996), which publication is incorporated herein by reference in its entirety.
In such backup systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Typically, the primary volume of the pair will be maintained in a primary direct access storage device (DASD) and the secondary volume of the pair is maintained in a secondary DASD shadowing the data on the primary DASD. A primary storage controller may be provided to control access to the primary DASD and a secondary storage controller may be provided to control access to the secondary DASD. In the IBM XRC environment, the application system writing data to the primary volumes includes a sysplex timer which provides a time-of-day (TOD) value as a time stamp to data writes. The application system time stamps data sets when writing such data sets to volumes in the primary DASD. The integrity of data updates is related to insuring that updates are done at the secondary volume in the volume pair in the same order as they were done on the primary volume. In the XRC and other prior art systems, the time stamp provided by the application program determines the logical sequence of data updates. In many application programs, such as database systems, certain writes cannot occur unless a previous write occurred; otherwise the data integrity would be jeopardized. Such a data write whose integrity is dependent on the occurrence of a previous data write is known as a dependent write. For instance, if a customer opens an account, deposits $400, and then withdraws $300, the withdrawal update to the system is dependent on the occurrence of the other writes, the opening of the account and the deposit. When such dependent transactions are copied from the primary volumes to secondary volumes, the transaction order must be maintained to maintain the integrity of the dependent write operation.
Volumes in the primary and secondary DASDs are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. In the banking example, this means that the deposit is written to the secondary volume before the withdrawal. A consistency group is a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. For instance, in the banking example, this means that the withdrawal transaction is in the same consistency group as the deposit or in a later group; the withdrawal cannot be in an earlier consistency group. Consistency groups maintain data consistency across volumes and storage device. For instance, if a failure occurs, the deposit will be written to the secondary volume before the withdrawal. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent.
A consistency time is a time the system derives from the application system""s time stamp to the data set. A consistency group has a consistency time for all data writes in a consistency group having a time stamp equal or earlier than the consistency time stamp. In the IBM XRC environment, the consistency time is the latest time to which the system guarantees that updates to the secondary volumes are consistent. As long as the application program is writing data to the primary volume, the consistency time increases. However, if update activity ceases, then the consistency time does not change as there are no data sets with time stamps to provide a time reference for further consistency groups. If all the records in the consistency group are written to secondary volumes, then the reported consistency time reflects the latest time stamp of all records in the consistency group. Methods for maintaining the sequential consistency of data writes and forming consistency groups to maintain sequential consistency in the transfer of data between a primary DASD and secondary DASD are described in U.S. Pat. Nos. 5,615,329 and 5,504,861, which are assigned to IBM, the assignee of the subject patent application, and which are incorporated herein by reference in their entirety.
Consistency groups are formed within a session. All volume pairs assigned to a session will have their updates maintained in the same consistency group. Thus, the sessions are used to determine the volumes that will be grouped together in a consistency group. Consistency groups are formed within a journal. From the journal, updates from a consistency group are applied to the secondary volume. If the system fails while updates from the journal are being applied to a secondary volume, during recovery operations, the updates that did not complete writing to the secondary volume can be recovered from the journal and applied to the secondary volume.
Because consistency groups are only formed within a session, consistency problems arise if a database or data set spans multiple sessions as consistency groups cannot maintain consistency across sessions. There is thus a need in the art to provide additional methods for allowing consistency across sessions or other groupings of storage areas.
Provided is a method, system, and program for maintaining data consistency among updates to data storage areas. Each update has an update time the update was made. There are multiple groups of data storage areas. For each group, an indication is made in a memory area of a group update time comprising a most recent update time of the updates in the group. The update time for each update in the group is not greater than the group update time. A determination is made of a minimum group update time across all the groups. At least one update is applied to storage if the update time for the update does not exceed the minimum group update time.
In further embodiments, updates within a group may further be defined into at least one consistency group having a consistency time. One consistency group within one group is selected. A determination is then made as to whether the consistency time of the selected consistency group is less than the minimum group update time. All the updates in the selected consistency group are applied to storage if the consistency time of the selected consistency group is less than the minimum group update time.
In still further embodiments, updates within a group may further be defined into at least one consistency group having a consistency time. In such case, updates in consistency groups are applied to storage if the update time of the update is less than the minimum group update time. After applying the updates to storage, the data in the storage is consistent as of the minimum group update time.
Yet further, updates in the consistency group are applied to storage during a data recovery operation. In this way, not all of the updates in one consistency group will be applied to the secondary storage if the update time of at least one update in the consistency group is greater than the minimum group update time.
Preferred embodiments of the present invention include a method, system, and program for insuring data consistency across different groups of volumes or storage areas. Preferred embodiments are particularly useful for mass storage spaces comprised of volumes spread across numerous storage systems. With preferred embodiments, the distributed volumes can be defined into groups, e.g., sessions, and updates can be maintained consistent across all groups. In case of a system failure, data recovery can assure that data across all the storage systems is consistent as of a single point-in-time.