1. Field of the Invention
The present invention relates to a method, system, and article of manufacture for managing writes received to data units that are being transferred to a secondary storage as part of a mirror relationship.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. Different copy technologies may be used for maintaining remote copies of data at a secondary site, such as International Business Machine Corporation's (“IBM”) Extended Remote Copy (XRC), Coupled XRC (CXRC), Global Copy, and Global Mirror.
In data mirroring systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Primary and secondary storage controllers may be used to control access to the primary and secondary storage devices.
Peer to Peer Remote Copy (PPRC) is a data mirroring solution offered on high end storage systems as part of a solution for disaster recovery. In synchronous PPRC, a writing host does not get an acknowledgement of write complete until the data is written in both the primary and secondary control units. In asynchronous PPRC, a host write to a primary controller gets an acknowledgement when the write operation is completed at the primary storage controller, i.e., when the data resides in the primary storage controller's cache. The primary controller may secure the data to store two cache copies and also copy the data to a secondary controller. With asynchronous PPRC, there is a risk of data loss if the primary controller crashes before copying the data to the secondary storage controller because the data that failed to transfer cannot be recovered.
In one prior art system, a consistency group may be formed by creating a consistent point across a replication environment, transmitting the updates to the secondary location, and saving consistent data to ensure a consistent image of the data is always available. A collision may occur if there is an update to a track that has not yet been copied to the secondary location. To protect the data in the consistency group, the completion of the write is delayed until the previous version of the track image has been sent to the secondary storage controller. For many intensive write workloads such as log volume updates that are performed in a sequential fashion (using 4 k blocks of data), the same track data is updated several consecutive times. For such workloads the collision algorithm may result in latency problems for the writes. The collision time is linear to the distance between the primary and the secondary sites. Another type of collision involves a common locking mechanism which is used while transferring a track from the primary storage controller to the secondary storage controller. When a track is sent from the primary to the secondary controller, a lock on the primary track is held so consecutive updates for the same track are delayed while the transfer is in progress.
There is a need in the art for an improved technique to handle collisions of writes to tracks that are being transferred to a secondary site as part of a data mirroring operation.