This invention relates to loosely-coupled copy operations between a primary and a remote secondary direct access storage device (DASD) through paths managed by a host CPU. More particularly, the invention relates to maintaining consistency between the primary and remote DASD volumes even when the CPU is updating the primary volume at the same time. This is critical where such updating occurs during initial primary-to-secondary volume synchronization and during resynchronization of the volumes after the occurrence of an I/O error or other outage.
The following paragraphs summarize the prior art. First, it is well known that a CPU randomly and sequentially updates tracks of one or more DASDs in an attached cache-based, staged storage subsystem. It is further known that remote electronic copying of DASD volumes is a frequently-used strategy toward maintenance of full-time information handling system availability in the presence of fault or failure of system components. Among the several copy operations, duplexing is favored over point-in-time copying because of the very low latency when the backup is substituted for the primary volume.
The prior art further teaches that remote volume-to-volume duplexing can be made transparent to applications on the CPU and with no CPU overhead. This can be accomplished synchronously by control unit-to-control unit volume copying. However, no new CPU access of the primary volume can be made until the current update is copied to the second site. In contrast, where the remote copying is performed asynchronously by CPU controlled paths, then the CPU access rate of the primary volume is independent of the backup copying. This is at the price of CPU copy management overhead. Lastly, it is known to use bit maps and volume addresses to place updates to primary volume tracks in a copy serial order for recording on a backup volume in a remote copy context, notwithstanding that such suffer from significant throwaway recording and overhead.
CPU Accessing Staged Storage
When an application runs on a multiprocessing CPU, such as an IBM S/390 with an MVS operating system, it will generate read or write calls for data to the operating system (OS). If the data is not present in CPU main memory, the OS will invoke an access method and establish a path to the data. The path will lead to data stored or to be written on one or more DASDs in an attached storage subsystem. The storage subsystem may be of the demand/responsive, hierarchically organized storage type. Illustratively, the IBM 3990 Model 6 storage control unit (SCU) is of that type. It includes a large multimegabyte cache, a nonvolatile store (NVS), and several redundant pathways to each of a plurality of 3390 DASDs or their equivalents.
If the application running on the S/390 has generated a read request, then the data would likely be stored in the SCU cache and transferred to main memory. Alternatively, if not in SCU cache, the read data would be staged to cache from one or more DASDs. It would then be copied to CPU main memory. In the case of an application-generated write, the changed or updated data would be moved from the host CPU main memory to the SCU cache. It would then be copied over to the NVS. This would permit the SCU to signal completion of the write operation and release the path coupling the SCU to the CPU. At a time subsequent, the data can be written out to the DASDs from NVS.
Remote Electronic Copying
Shomler et al., U.S. Pat. No. 5,446,871, xe2x80x9cMethod and Arrangement for Multi-System Remote Data Duplexing and Recoveryxe2x80x9d, issued Aug. 29, 1995, emphasized that data copying as a storage function was the principle form of data preservation. According to Shomler, data copying was originally little more than an archive function. That is, trucks moved copies of magnetic tape recorded business transactions to remote mountain caves on a weekly or monthly basis such that businesses might restart in a post-nuclear holocaust era. However, today it is a necessity to maintain constant availability of data and systems. Thus, equipment and data are duplexed both locally and remotely. In this latter regard, Shomler proposed a method of remote electronic copying of locally stored DASD data using a token and unique sequence number responsive to each write operation at a primary site. His method relied upon the number and a list of items already sent to establish a sequence order, and thereby define gaps from which missing updates could be ascertained in the event of error, fault, or outage.
Even Shomler pointed out there was no single flavor of the copy function that would accommodate the relevant system and storage management factors. He listed several factors that should be considered in copy method design and use. These include: (1) protection domain (system and/or environmental failure or device and/or media failure), (2) data loss (no loss/partial loss), (3) time where copying occurs as related to the occurrence of other data and processes (point in time/real time), (4) the degree of disruption to applications executing on said computer, and (5) whether the copy is application or storage subsystem based.
Echoing Shomler""s recognition for the need of several copy functions, large systems offer a suite of copy functions as an optional part of the resident operating system. One such suite is offered as part of the IBM MVS/DFSMS package. This package includes volume-to-volume copy operations under the control of the SCU, such as Dual Copy or Peer-to-Peer Remote Copy (PPRC). It also includes single or multivolume copying under host S/390 level control such as Concurrent Copying or Extended Remote Copy (XRC). Dual Copy is a local or same site volume duplexing feature usually under a RAID 1 rubric.
Synchronous Remote Copying and Concurrent Updating
Duplexing means rendering a second volume to be the mirror image of a primary volume. Remote data copying (duplexing) may be either synchronous or asynchronous. A synchronous remote copy function is termed Peer-to-Peer Remote Copy (PPRC). PPRC involves a direct path between DASD storage subsystems avoiding the host CPU. In PPRC, one or more tracks from the primary volume are copied through a first SCU. The copied tracks are then sent to a remote or secondary SCU location over a direct SCU/SCU ESCON-like channel.
Significantly, confirmation must be received by the primary site of the fact that copied tracks have been written to remote secondary NVS or DASD before terminating the path between the host CPU and the primary storage subsystem (SCU). This means that the next I/O access of the SCU cannot start until after the confirmation. This confirmation requirement substantially reduces the host/primary storage subsystem access rate. Relatedly, as die distance between the primary and secondary increases, the delay between accesses is further increased. This still further reduces the primary subsystem access rate. However, a consistent set of tracks and updates can be communicated between the SCUs with virtually no host CPU overhead and low SCU-to-SCU overhead.
In PPRC, the secondary or remote SCU must also recognize when the secondary volume is out of synchronization with the primary volume. Responsively, the primary SCU can suspend the remote copy function, mark the updates in some manner, and queue the updates for subsequent transmission to the secondary SCU. Note, new host accesses of the primary are still held up until the previous transfers (updates) have been synchronized at the secondary volume. A description of such a PPRC system with an efficient peer coupling may be found in the copending Hathorn et al. application, U.S. Ser. No. 08/782,474, now U.S. Pat. No. 5,920,695, xe2x80x9cMethod and Means for Bidirectional Peer-coupled Communication Across a Single ESCON Interfacexe2x80x9d, filed Jan. 10, 1997.
One problem is that of serializing updates to datasets which occur during the copy interval. The serialization of write updates in such a PPRC arrangement is set out in the copending Blount et al. application. U.S. Ser. No. 08/779,577, now U.S. Pat. No. 5,875,479, xe2x80x9cMethod and Means for Making a Dual Volume Level Copy in a DASD Storage Subsystem Subject to Updating During the Copy Intervalxe2x80x9d, filed Jan. 7, 1997.
Blount uses a bit status map of the datasets in the primary volume. For any given copy session, the counterpart bits of the datasets to be copied are turned on. As the session progresses, the bits in the session are turned off as the datasets are copied over to the secondary in map serial order. In the event that write updates are made anywhere in the primary volume, the counterpart bit is turned on if the dataset has already been copied to the secondary. During the next pass, the updated datasets with turned on bits are copied out in map serial order and their bits turned off. This results in at least two passes over the map and an appropriate serialization of copies and their updates. For purposes of this specification, a unit of storage is taken to mean a mapped unit of data and vis-à-vis.
Asynchronous Remote Copy
The asynchronous remote copy method (XRC) is a host CPU-based, duplex, volume-to-volume copy process. XRC asynchronously copies tracks and concurrent track updates on a primary volume in a DASD storage subsystem. The copies are sent through an attached CPU to a secondary volume in a remote DASD storage subsystem. The copies are transmitted over a long-haul communications path, possibly thousands of kilometers in length.
XRC has minimal impact on the host/primary SCU access rate. In the XRC copy process, an access operation (I/O) is considered completed when an update is written into nonvolatile storage (NVS) at the primary site SCU or written out to the primary DASD volume. The copy process to the secondary volume is asynchronous. However, since updates occur anywhere over the primary volume during the copy interval, significant host processing software and cycles must be expended to ensure consistency.
Reference may be made to the copending Kern et al. application, U.S. Ser. No. 08/506,590, now abandoned, xe2x80x9cAsynchronous Remote Copy Session Recovery Following System or DASD Subsystem Failurexe2x80x9d, filed Jul. 25, 1995. Kern""s method employs change-recording bit maps in primary DASD subsystems to keep a record by DASD track address and timestamps of tracks that have been changed. A host CPU-based software construct functioning as a cooperative system data mover (SDM) is also disclosed to copy the changes and, where appropriate, forward them to the secondary site. However, in Kern""s version of XRC, attention is focused on maintaining consistency across several volumes rather than on maintaining consistency within a volume.
Occasionally, an access error or an outage may occur resulting in suspension of a copy session. On resuming a copy session, Kern""s method uses the SDM, the change-recording bit maps, and timestamps to identify all tracks that have changes and that may not have been copied to their secondary copy volumes. Those tracks will need to be recopied before the secondary devices can be restored to an XRC duplex state.
In most XRC session resume instances, the session resumption must be performed concurrent with the host CPU updating of primary copy DASD. This requires that the SDM (re)establish the volumes with the subsystems and accept updates from the subsystems, then correlate the time it reads each track to be recopied with changes that may be made to those tracks by application programs, discarding changes made before the track was read. In a copy session of any size, together with much application activity, this may result in the data mover having to read a number of primary updates that it will subsequently discard because they occurred before the data mover read the to-be-recopied track.
It is an object of this invention to devise a method and means to effectuate loosely-coupled copy operations between a primary and a remote secondary DASD through control unit mediated paths managed by a host CPU.
It is a related object that such method and means maintain consistency between the primary and remote DASD volumes, even when the CPU is updating the primary volume at the same time.
It is yet another related object that such method and means maintain consistency where such updating occurs during initial primary-to-secondary volume synchronization, and during resynchronization of the volumes after the occurrence of an I/O error or other outage.
It is a further object to reduce the processing overhead associated with the CPU and control units in volume resynchronizing through an efficient scheduling and copying on the secondary volume of primary track updates occurring during the resynchronization interval.
The foregoing objects are satisfied in an embodiment expressed as a method for maintaining consistency among DASD tracks of data on a primary volume with counterpart tracks of data on a secondary volume. In this arrangement, CPU-initiated write updates to selected ones of the tracks on the primary volume are made by way of a CPU-established path through a first mediating control unit. Similarly, copying of the primary tracks on the secondary DASD volume are made asynchronously by way of another CPU-established path through a second mediating control unit.
The first step of the method involves initially synchronizing the primary and secondary volumes over the mediated paths through the CPU by progressively copying primary tracks on the secondary in a monotonic address order. Also, concurrently occurring updates to primary tracks are copied on the secondary volume if the address of the updated track does not exceed the copy address progression of the primary tracks recorded at the secondary volume.
The second step of the method is directed to resynchronizing the primary and secondary volumes over the mediated paths through the CPU in case of extrinsic error, fault, or the like. This is accomplished by ascertaining the status of primary tracks, primary tracks xe2x80x9cin flightxe2x80x9d through the volume shadowing process, and primary tracks updated during volume suspension and scheduling, and recording on the secondary volume the most recent version copy order of the primary tracks using bit-mapped update status and timestamping.
More particularly, the first step of the method, namely that of initially synchronizing the tracks of data on the primary DASD volume with counterpart tracks on the secondary DASD volume, comprises several substeps. These substeps include reading from the primary volume of a predetermined number of tracks as a group in a monotonic address order and copying said tracks in that address order on the secondary volume. The substeps further include forming record sets of CPU-originated updates to the tracks on the primary volume and copying those record sets to the secondary volume having addresses less than the highest address of the primary tracks copied onto the secondary volume.
More particularly, the second step of the method, namely that of resynchronizing the tracks of data on the primary volume with counterpart tracks on the secondary volume, is responsive to the occurrence of an extrinsic error, fault, or the like. The substeps include suspending the primary volume, and continuing the bit map recording of tracks on the primary volume which change during the suspension interval. The next step is enabling the primary control unit to create record sets if updates are made to primary tracks unmodified before or during suspension. This is followed by reading the bit map status of primary tracks which were in flight or modified by the CPU during the suspension interval, and causing the primary control unit to monitor CPU updates to primary tracks within an address range containing the primary tracks modified before or during suspension.
After this, the resynchronization method requires forming record sets by the primary controller of tracks modified by CPU updates occurring after volume resynchronization has started, timestamping the record sets, and sending them to the secondary control unit. Subsequently, there occurs the step of writing out to the secondary volume by the secondary control unit of groups of tracks modified before or during the suspension interval in approximate monotonic address order and recording the timestamp associated with that group. Finally, the last step contemplates either writing out to the secondary volume by the secondary control unit of record sets of primary tracks modified after volume resynchronization has started if the highest (latest) timestamp associated with the record sets occurs prior to the last timestamp recorded with groups of primary tracks modified before or during suspension or otherwise repeating the steps of writing out the primary tracks modified before or during suspension to the secondary volume and timestamp comparing until the condition is satisfied.