Catastrophic computer system disasters may occur suddenly and immediately halt all processing within the system at a single point in time. However, it is more common that an error interrupts operations in stages, occurring over several seconds or even minutes. One such system to handle system failures is a remote copy system. Remote copy is based on two systems, a primary application system at one location and a recovery system at another location. Both systems can be located in the same building, or at remote locations. Each system has a dedicated direct access storage device (DASD). The DASD dedicated to the recovery location backs-up data written to the DASD dedicated to the primary application system. In case of disaster at the primary location, data is recovered from the secondary DASD located at the recovery system.
International Business Machines Corporation (IBM) provides two remote copy systems utilizing the IBM 3990 Model 6 storage controller, the Peer-to-Peer-Remote Copy (PPRC) and Extended Remote Copy (XRC). Both these systems address the problem of unrecoverable data that occurs between the last, safe backup of an application system to a recovery system and the time when the application system fails.
PPRC provides a synchronous copy of data to the secondary DASD in that the primary controller delays further input/output (I/O) operations to a DASD until all data has been copied to the secondary DASD. With PPRC, a primary controller communicates directly with a secondary controller. The secondary controller backs-up data the primary controller stores in an associated DASD. XRC provides an asynchronous copy of data to the secondary DASD such that operations at the primary controller are allowed to continue before all data has been copied to the secondary DASD. XRC includes a data mover system to move data between the primary and secondary controllers. This process of using a secondary controller and DASD to shadow data maintained in the primary DASD is described in U.S. Pat. No. 5,615,329 to Robert F. Kern et al., and assigned to IBM, the assignee of the subject patent application, which is incorporated herein by reference in its entirety.
Present data shadowing and remote copy systems involving primary and secondary controllers have limited error diagnostic and recovery operations. For instance, with the XRC system, the system is reinitialized and resynchronized after an error is detected and corrected.