The present disclosure relates generally to methods and systems for controlling redundant storage systems, and more specifically, to an improved HyperSwap that supports multiple replication sessions.
HyperSwap is designed to broaden the continuous availability attributes of z/OS by extending the redundancy to storage systems. HyperSwap provides the ability to transparently switch all primary storage systems with the secondary storage systems for a planned reconfiguration and to perform disk configuration maintenance and planned site maintenance without requiring any applications to be quiesced. In addition, the HyperSwap operation can transparently switch to secondary storage systems in the event of unplanned outages of the primary storage systems. Unplanned HyperSwap support allows production systems to remain active during a disk storage system failure.
When a HyperSwap trigger events occurs, changes to data on the primary volumes is prevented by issuing command to freeze data across all volumes being mirrored (or replicated). The command to do this is issued for all Logical Subsystems (LSS) pairs that contain HyperSwap managed volumes, also referred to as devices. All I/O to all managed volumes is queued to maintain full data integrity and data consistency across all volumes. The HyperSwap operation then makes the target volumes available to be used and swaps information in internal control blocks to point to the recovered target volumes. When this has completed, all I/O is released and all applications continue to run against the recovered target volumes, thus masking or avoiding a complete disk system outage, with a dynamic ‘busy’ and a redirection of all I/O.
Many large storage systems host tens of thousands of volumes. Some servers may be accessing volumes in more than one storage system. Currently if one or more volumes in a storage system fail, all managed volumes of the failing storage system are HyperSwapped. This includes volumes on the failing storage subsystem not impacted by the failure, and other physical storage systems also unaffected by the failure. Using traditional HyperSwap to swap all volumes, instead of some subset of affected volumes takes more time to complete and increases the likelihood that a problem will occur during the swap.