1. Field of the Invention
The present invention relates to data storage systems generally, and particularly, to improvements in a Peer-to-Peer Remote Copy system for data backup and data recovery.
2. Discussion of the Prior Art
Peer-to-Peer Remote Copy (“PPRC”) is a hardware-based disaster recovery solution designed to maintain a mirror image of application data at a remote secondary location. Particularly, key to PPRC, is the migration of data sets from mass storage devices, such as hard disk drives or other data storage media, to another set with a minimum of disruption to the applications using the data. Particularly, Peer-to-Peer Remote Copy (PPRC) mechanisms automatically copy changes that are made to a source (primary) volume to a target (secondary) volume until the PPRC relationship is suspended or terminated.
FIG. 1 depicts, in general, a PPRC system 10 showing a primary Enterprise Storage System 15 including a primary production Enterprise Storage Server (ESS) 17 and a host server 20 running a host application that reads and writes data to the primary ESS 17. The primary ESS 17 is linked to a secondary ESS storage system 25 including a remotely located secondary backup 27 and corresponding remote back-up host server 30 via an Enterprise Systems Connection (“ESCON”) connection 45. In current configurations, the enterprise connection 45 comprises a high-speed link, supporting, for example, 2-Gigabit-per-second (Gbps) Fibre/FICON data transfer rates, however, other ESS system configurations implementing other high-data rate connectivity are applicable. As known, peer-to-peer remote copy solutions comprises functionality for enabling direct and synchronous copying of data at the volume level from the primary ESS 17 to the secondary backup ESS 27. As known, the PPRC solution for direct copying of data is transparent to the operating system of the primary host server and any applications running on the primary host, however, there is a performance impact on application I/Os. Further, the default operation on certain ESS operating systems, e.g., the OS/390 and the z/OS operating systems, manufactured by International Business Machines, Armonk, N.Y., implements a 30-second missing-interrupt-handler (MIH) timeout for the ESS. Particularly, when a primary PPRC volume is having difficulty being communicated to the secondary remote PPRC volume, and a host is attempting to write to the primary volume, the ESS must suspend the PPRC relationship because the ESS cannot hold off the I/O for more than 30 seconds without causing an MIH (missing interrupt handler) error. That is, in a mirroring process executed by the PPRC in a synchronous data transfer mode, the PPRC primary host server 20 writes data to the ESS primary volume which data is then mirrored (transferred) to a corresponding secondary (remote ESS) volume. At the time of the write operation, the host server initiates the MIH timer for counting a timeout period e.g., 30 seconds, within which to expect a final status indicating the write operation to the remote volume has completed. However, within a pre-determined time period for performing the write operation as timed by an internal timer mechanism, the ESS primary data storage system that includes the primary volume must inform the host that a remote data transfer has successfully completed after the data storage system containing the secondary volume acknowledges that it has received and checked the mirrored data. If the primary ESS does not receive an ending status within that pre-determined time period, e.g., less than 30 seconds, then PPRC mirroring operations are suspended and an ending status is generated for the host in order to avoid the MIH timeout. The ESS will subsequently have to be re-synchronized and re-driven to effect the same remote PPRC data transfer data that had been previously suspended.
While commonly owned U.S. Pat. No. 5,894,583 describes a system for preventing erroneous indications which may be caused for an I/O request when it is unduly delayed by other contending operating system I/O requests, the system provides for variable MIH timeout periods for a delayed I/O request in predetermined extension increments. In U.S. Pat. No. 5,894,583, by issuing a long-busy interrupt signal, the operating system will cause the addition of an MIH timeout extension increment according to the nature of the reason for the time-out delay.
It would thus be highly desirable to provide a simple and easily implementable solution to avoid the suspension of write activity between the primary and secondary ESS pair in the first instance, and at least, provide a mechanism for enabling a host data volume transfer retry.