Field of the Invention
The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to a method, system and computer-usable medium for resolving failed mirrored copies with minimum disruption.
Description of the Related Art
Data storage systems often include a feature to allow users to make a copy of data at a particular point-in-time (PIT). A point-in-time copy is a copy of the data consistent as of a particular point-in-time, and would not include updates to the data that occur after the point-in-time. Point-in-time copies are created for data duplication, disaster recovery/business continuance, decision support/data mining and data warehousing, and application development and testing.
One data duplication technique for copying a data set at a particular point-in-time is the International Business Machines Corporation's (“IBM”) Concurrent Copy feature. Concurrent Copy performs back-up operations while allowing application programs to run. Concurrent Copy insures data consistency by monitoring input/output (I/O) requests to the tracks involved in the Concurrent Copy operation. If an I/O request is about to update a track that has not been duplicated, then the update is delayed until the system saves a copy of the original track image in a cache side file. The track maintained in the side file is then eventually moved over to the target copy location. Concurrent Copy is implemented in a storage controller system, where the storage controller provides one or more host systems access to a storage device, such as a Direct Access Storage Device (DASD), which often comprises numerous interconnected hard disk drives. With Concurrent Copy, data is copied from the DASD or sidefile, to the host system initiating the Concurrent Copy operation, and then to another storage device, such as tape back-up.
Concurrent Copy is representative of a traditional duplication method in which the source data to copy is read from the disk into the host. The host then writes a duplicate physical copy back to the receiving disk. This method uses substantial processing cycles to perform the I/O operations for the copying and disk storage, and can take considerable time. In fact, the amount of time and resources consumed are directly proportional to the amount of data being copied. The larger the size of the data, the more resources, and time, used.
Another data duplication technique for storage controller systems is the often referred to as the FlashCopy data duplication technique available from International Business Machines, Inc. The FlashCopy data duplication technique makes it possible to create, nearly instantaneously, point-in-time snapshot copies of entire logical volumes or data sets. Using this FlashCopy data duplication technique, the copies are immediately available for both read and write access. With the FlashCopy data duplication technique, entire volumes may be substantially instantaneously copied to another volume by using the storage systems such as Enterprise Storage Subsystems (ESS). The FlashCopy data duplication technique also provides an ability to flash individual data sets along with support for consistency groups. Consistency groups enhance with FlashCopy data duplication technique to create a consistent point-in-time copy across multiple volumes, and even across multiple ESSs, thus managing the consistency of dependent writes.
One issue that relates to a point-in-time copy such as the FlashCopy data duplication technique arises when a copy operation fails while mirroring the point in time copy across an asynchronous remote mirror (XRC). When a failure occurs during the copy operation while mirroring, the mirror relationship is not necessarily consistent. This inconsistency needs to be resolved for the copies to be valid. One known method for resolving this type of problem suspends an XRC pair for the target of the FlashCopy and resynchronizes the copies using a bitmap which contains the tracks affected by the FlashCopy (as well as other unrelated changes). However, this method can have significant performance issues and can also impact the recovery point objective (RPO) of the disaster recovery (DR) site. A less disruptive method of recovering from this type of error is desirable.