1. Field of the Invention
The present invention relates to a computer program product, system, and method for caching source blocks of data (e.g., tracks) for target blocks of data (e.g., tracks) for which data has not yet been copied from corresponding source blocks of data.
2. Description of the Related Art
Computing systems often include one or more host computers (“hosts”) for processing data and running application programs, direct access storage devices (DASDs) for storing data, and a storage controller for controlling the transfer of data between the hosts and the DASD. Storage controllers, also referred to as control units or storage directors, manage access to a storage space comprised of numerous hard disk drives, otherwise referred to as a Direct Access Storage Device (DASD). Hosts may communicate Input/Output (I/O) requests to the storage space through the storage controller.
Some disaster recovery systems address data loss over a period of time, in which case writes to volumes on data storage may be lost. The writes may update data, write new data, or write the same data again. To assist in recovery of data writes, a copy of data may be provided at a remote location. Such copies may also be referred to as dual or shadow copies.
Remote mirroring systems provide techniques for mirroring data in order to facilitate recovery after a system failure. Such data shadowing systems can also provide an additional remote copy for non-recovery purposes, such as local access at a remote site.
In remote mirroring systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a source (or “primary”) storage device and a corresponding volume in a target (or “secondary”) storage device that includes a copy of the data maintained in the source volume.
A point-in-time copy involves physically copying all the data from source volumes to target volumes so that the target volume has a copy of the data as of a point-in-time. A point-in-time copy can also be made by logically making a copy of the data and then only copying data over when necessary, in effect deferring the physical copying. This logical copy operation is performed to minimize the time during which the target and source volumes are inaccessible.
Instant virtual copy operations work by modifying metadata in structures, such as relationship tables or pointers, to treat a source data object as both the original and copy. In response to a host's copy request, the storage subsystem immediately reports creation of the copy without having made any physical copy of the data. Only a “virtual” copy has been created, and the absence of an additional physical copy is completely unknown to the host.
Later, when the storage system receives updates to the original or copy, the updates are stored separately and cross-referenced to the updated data object only. At this point, the original and copy data objects begin to diverge. The initial benefit is that the instant virtual copy occurs almost instantaneously, completing much faster than a normal physical copy operation. This frees the host and storage subsystem to perform other tasks. The host or storage subsystem may even proceed to create an actual, physical copy of the original data object during background processing, or at another time.
One such instant virtual copy operation is known as a FlashCopy® operation. (FlashCopy is a trademark or common law mark of International Business Machines Corporation in the United States and/or other countries.) A FlashCopy® operation involves establishing a logical point-in-time relationship between source and target volumes on the same or different devices. The FlashCopy® operation guarantees that until a track in a FlashCopy® relationship has been hardened to its location on the target disk, the track resides on the source disk. A relationship table is used to maintain information on all existing FlashCopy® relationships in the subsystem. During the establish phase of a FlashCopy® relationship, one entry is recorded in the source and target relationship tables for the source and target that participate in the FlashCopy® being established. Each added entry maintains all the required information concerning the FlashCopy® relationship. Both entries for the relationship are removed from the relationship tables when all FlashCopy® tracks from the source storage have been physically copied to the target storage or when a FlashCopy® withdraw command is received. A FlashCopy® withdraw command may be described as a command to end a FlashCopy® relationship. In certain cases, even though all tracks have been copied from the source storage to the target storage, the relationship persists.
The target relationship table further includes a bitmap that identifies which tracks involved in the FlashCopy® relationship have not yet been copied over and are thus protected tracks. Each track in the target device is represented by one bit in the bitmap. The target bit is set (e.g., either logically or physically) when the corresponding track is established as a target track of a FlashCopy® relationship. The target bit is reset when the corresponding track has been copied from the source location and destaged to the target device due to writes on the source or the target device, or a background copy task.
Once the logical relationship is established, hosts may then have immediate access to data on the source and target volumes, and the data may be copied as part of a background operation. A read to a track that is a target in a FlashCopy® relationship and not in cache triggers a stage intercept, which causes the source track corresponding to the requested target track to be staged to the target cache when the source track has not yet been copied over and before access is provided to the track from the target cache. This ensures that the target has the copy from the source that existed at the point-in-time of the FlashCopy® operation. Further, any destages to tracks on the source device that have not been copied over triggers a destage intercept, which causes the tracks on the source device to be copied to the target device.
When a target track is to be read, the target track may need to be staged from the source storage if the target track has not been copied to the target storage since the instant virtual copy was established. Some systems cache the target track. So, on a subsequent read on the target track, the target track is a hit (i.e., is found in the cache).
However, on a FlashCopy® withdraw, such target tracks in the cache need to be discarded, since these tracks do not have data copied from the source storage to the target storage and the metadata on disk for the FlashCopy® does not match the target track in the cache.
Also, there can be multiple target storages for a source storage. For reads on those target storages, multiple copies of a source track may be cached as separate target tracks for each target storage.
Conventional systems perform a read operation for a target track in a FlashCopy® relationship in the following manner:
1. Host issues read operation to target storage
2. Stage source data to cache (i.e., make a new copy of the source data)
3. Synthesize the source data to make it appear to be coming from target storage
4. Rebuild the Track Format Descriptor (TFD)/Record Zero Data Table (R0DT)(Count Key Descriptor (CKD)) to match data in the cache from the source storage
Then, on a FlashCopy® withdraw, the following operations are performed:
5. Scan entire cache and discard data staged from the source (in operation #2 above)
6. Scan entire volume to invalidate the TFD/R0DT (from operation #4 above)
Notwithstanding existing instant virtual copy operations, there is a need for an improved instant virtual copy operation.