The present invention relates to a recovery mechanism for recovering a path which becomes unavailable in a network having paths between multiple central processor complexes (CPCs) and a coupling facility, and more particularly relates to a mechanism for refreshing local CPC resources associated with the coupling facility in parallel with mainline access to the local resources to recover from the complete loss of paths.
Commands executed by a coupling facility such as a structured external storage (SES) facility on behalf of one system image are designed to initiate signals to another system image in the same or different CPC(s). The generated signals include cross-invalidates and list-state transition notifications. Normal processing for these signals results in the update of local CPC vector resources to reflect state changes in the corresponding resources in the shared storage facility. However, when connectivity between the system and the shared storage facility is lost, the local vector resources can become down-level with respect to the state of corresponding resources at the shared storage facility.
For a variety of failure scenarios, the loss of path connectivity to the shared storage facility is either transient or recoverable and connectivity can be restored. However, recovery for the local CPC vector resources must be performed before normal usage can be resumed. Given stringent responsiveness requirements associated with the execution of commands directed to a structured external storage facility and even more stringent requirements for references to the local CPC vector resources, it is critical that recovery for local CPC resources be performed in such a way as to avoid the introduction of significant delays in transaction response times during the recovery process. Further, recovery for these temporary outages must be performed by the resource-owning hardware and software components without participation from the set of programs connected to these resources for the purposes of data sharing. Failure to achieve this can cause temporary outages to be treated as permanent connectivity failures as viewed by the programs sharing the structured external storage facility on the affected system, which can translate into significant continuous availability impact.
At the same time, it is essential that the recovery process for these temporary or recoverable path connectivity failures does not in any way compromise data integrity. This requires the ability to detect and recover for recursive path failures as well as alternate path failures experienced during the recovery process which can dynamically change the scope of recovery for local vector resources accessed via the set of paths to the shared facility.