1. Field of the Invention
The present invention relates to a method, system, and program for using a heartbeat signal to maintain data consistency for writes to source storage copied to target storage.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. Different copy technologies may be used for maintaining remote copies of data at a secondary site, such as International Business Machine Corporation's (“IBM”) Extended Remote Copy (XRC), Coupled XRC (CXRC), Global Copy, and Global Mirror Copy. These different copy technologies are described in the IBM publications “The IBM TotalStorage DS6000 Series: Copy Services in Open Environments”, IBM document no. SG24-6783-00 (September 2005) and “IBM TotalStorage Enterprise Storage Server: Implementing ESS Copy Services with IBM eServer zSeries”, IBM document no. SG24-5680-04 (July 2004).
In data mirroring systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Primary and secondary control units, also known as storage controllers or enterprise storage servers, may be used to control access to the primary and secondary storage devices. In certain backup system, a sysplex timer is used to provide a uniform time across systems so that updates written by different applications to different primary storage devices use consistent time-of-day (TOD) value as a time stamp. Application systems time stamp data sets when writing such data sets to volumes in the primary storage. The integrity of data updates is related to ensuring that updates are done at the secondary volumes in the volume pair in the same order as they were done on the primary volume. The time stamp provided by the application program determines the logical sequence of data updates.
In peer-to-peer remote copy operations (PPRC), multiple primary control units may have source/target pairs, i.e., volume pairs, included in consistency groups so that data copied to target volumes by the different primary control units maintains data consistency. A host system includes a program, referred to as a consistency manager, to maintain data consistency across the different primary control units having source/target pairs in a consistency group. In the current art, if a primary control unit detects an error, such as a failure with the connection to secondary control unit managing access to the target storage in the source/target pair, then the primary control unit may initiate a freeze operation to block any further writes to the source volumes. In response to the freeze operation, application programs blocked from writing data would not write any more data to any primary control unit. After initiating the freeze operation, the primary control unit would send an interrupt to the consistency manager identifying the freeze and set a freeze timeout timer. At the expiration of the freeze timeout timer, the primary control unit would initiate a thaw operation to start accepting writes from the application to the source storage in the source/target pair, but not copy the writes to the target storage.
In the current art, if the primary control unit cannot communicate the interrupt to the consistency manager to allow the consistency manager to send freeze commands to all primary control units, then applications writing to primary control units other than the primary control unit where the freeze occurred may have their data writes transferred to the target storage even though data at the primary control unit where the freeze occurred would not copy writes to the target storage. This may result in data inconsistency at the target storage.
For these reasons, there is a need in the art to provide techniques for maintaining data consistency.