A mirrored storage system, such as a peer-to-peer remote copy system, typically will include a primary or production site attached to a host and a secondary or recovery site which may or may not be geographically remote from the production site. During normal operation of the system, the data at the recovery site remains synchronized with data at the production site in order to maintain a consistent backup set of data at the recovery site. A failure at the production site severs the communications link between the production and recovery sites and triggers a “failover” operation at the recovery site. During the failover operation, host writes are directed to the recovery site which keeps track of all such writes in an “out-of-sync (OOS) bitmap. In the bitmap, each bit which is set represents a data track which has been modified and will need to be transferred to the production site after recovery from the failure.
During the failure and subsequent recovery, host writes may also be performed at the production site. These writes may include test writes made during recovery or mid-transaction writes which were interrupted at the time of the failover. Data associated with such writes at the production site are considered corrupt and should be discarded as part of the recovery process. Consequently, these tracks should be replaced by the corresponding valid tracks stored at the recovery site.
Recovery from a failure includes a failback resynchronization operation whereby correct tracks (to replace corrupt production site tracks) and modified tracks are transferred from the recovery site to the production site. During conventional resynchronizations, the recovery site reads the OOS bitmap of the production site and merges it with the OOS bitmap of the recovery site, such as with a logical OR operation. The resulting bitmap indicates all of the tracks which are to be transferred to the production site to resynchronize production site data.
In many mirrored systems, however, disk geometries at the two sites are different; that is, the size of data tracks at the production site are different from the size of data tracks at the recovery site. For example, if the production site includes an IBM® TotalStorage® DS8000 or DS6000 disk storage system, the track size will be 64K. If the recovery site includes an IBM TotalStorage Enterprise Storage Server® Model 800, the track size will be 32K. It will be appreciated that the OOS bitmaps of the two sites will not be compatible with each other and cannot, therefore, be directly merged. One solution to this problem has been for the recovery site to determine the track numbers of the first and last tracks indicated by the production site OOS bitmap and adjust those track numbers to match the corresponding track numbers at the recovery site. For example, the OOS bitmap may indicate that tracks 1 and 1,000,000 have been modified. All of the tracks between the adjusted first and last tracks are then transferred from the recovery site to the production site, even if only a few of the tracks needed to be transferred. Thus, in the example all one million tracks will be transferred, even though only two needed to be. Consequently, it will be appreciated that this solution can impose a large performance penalty on the failback resynchronization process.