1. Field of the Invention
The present invention relates to a method, system, and program for determining whether to use a repository to store data updated during a resynchronization.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. Different copy technologies may be used for maintaining remote copies of data at a secondary site, such as International Business Machine Corporation's (“IBM”) Extended Remote Copy (XRC), Coupled XRC (CXRC), Global Copy, and Global Mirror Copy.
In data mirroring systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Primary and secondary storage controllers may be used to control access to the primary and secondary storage devices. In certain data mirroring systems, a timer is used to provide a uniform time across systems so that updates written by different applications to different primary storage devices use consistent time-of-day (TOD) value as a time stamp. The host operating system or the application may time stamp updates to a data set or set of data sets when writing such data sets to volumes in the primary storage. The integrity of data updates is related to insuring that updates are done at the secondary volumes in the volume pair in the same order as they were done on the primary volume. The time stamp provided by the application program determines the logical sequence of data updates.
In many application programs, such as database systems, certain writes cannot occur unless a previous write occurred; otherwise the data integrity would be jeopardized. Such a data write whose integrity is dependent on the occurrence of a previous data write is known as a dependent write. Volumes in the primary and secondary storages are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. A consistency group has a consistency time for all data writes in a consistency group having a time stamp equal or earlier than the consistency time stamp. A consistency group is a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. The consistency time is the latest time to which the system guarantees that updates to the secondary volumes are consistent. Consistency groups maintain data consistency across volumes and storage devices. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent.
One technique to provide a consistent point-in-time copy of data is to suspend all writes to the primary storage and then while writes are suspended copy all the data to mirror to the secondary storage or backup device. A disadvantage of this technique is that host writes are suspended for the time to create a point-in-time copy of data, which may adversely effect application processing at the host. An alternative technique is to establish a logical copy of data at the primary storage target, which takes a very short period of time, such as no more than a second or two. Thus, suspending host writes to the primary storage during the time to establish the logical copy is far less disruptive to host application processing than would occur if host writes were suspended for the time to copy all the source data to the target volume. After establishing the logical copy, source volume data subject to an update is copied to a target volume so that the target volume has the data as of the point-in-time the logical copy was established, before the update. This defers the physical copying until an update is received. This logical copy operation is performed to minimize the time during which the target and source volumes are inaccessible. The point-in-time copy comprises the combination of the data in the source volume and the data to be overwritten by the updates
One such logical copy operation is known as FlashCopy® (FlashCopy is a registered trademark of International Business Machines, Corp. or “IBM”). FlashCopy® involves establishing a logical point-in-time copy relationship between primary and secondary volumes on different devices. Once the logical relationship is established, hosts may then have immediate access to data on the primary and secondary volumes, and the data may be copied as part of a background operation. The data is copied from the primary to the secondary volumes as part of a background operation. While the data is being copied over, any reads of data on secondary tracks that have not been copied over cause the data to be copied over from the primary device to the secondary cache so that the secondary target has the copy from the source that existed at the point-in-time of the FlashCopy® operation. Further, any writes to tracks on the primary storage that have not been copied over cause the data to be overwritten on the tracks on the primary storage to be copied to the secondary storage.
To perform the logical copy operation, an entire target volume may be allocated at the secondary storage to store updates to the primary volume, which requires that the same amount of storage space be allocated on the secondary storage for the target volume as is allocated in the primary storage for the source volume. To save space on the secondary storage space, certain space efficient logical copy techniques known in the art allocate a repository to store the data to be overwritten by the updates to the source volume during the logical copy period, where the repository space is substantially less than the full volume size of the source volume because in many cases the data updated on the source volume during the logical copy duration that must be copied to the target is substantially less than the storage space of the full source volume.
In a synchronization environment, a primary storage controller may mirror writes to a primary storage to a secondary storage. A secondary storage controller managing the secondary storage may further make a virtual copy of the secondary storage to form a consistency group so that data in the secondary storage as of a point-in-time is backed up in a repository or a full volume backup.
There is a need in the art for continued improvements to take advantage of space efficient logical copy operations that utilize a repository less in size than the full source volume subject to the logical copy operation.