Shared File Systems (SFS) is a term applied to IBM's System/390 (S/390) system for sharing data among virtual machines. IBM's DB2 has been adapted for this type of data sharing in a Multiple Virtual Storage (MVS/Enterprise Systems Architectures (ESA) environment by using IBM's coupling facility to create multi-system data sharing.
In such a shared system, when one of the systems fails, the update mode locks (data locks) that were held at the time of the failure are “retained” to prevent the other systems from accessing inconsistent data (data that had not yet reached a point of consistency at the time of the failure). To remove the retained data locks, the failed system's logs must be read in a forward and a backward direction in order to bring the data back to a point of consistency. Once this has been done, the retained locks can be removed, and the data is again accessible from all the systems.
One conventional method generally employed to remove the retained locks when an operating system fails is the restart/recovery method. Utilizing the restart/recovery method, the failed system is restarted (either manually or automatically) on another operating system in the cluster and recovery logic is used to “recover” the data being protected by the retained data locks and bring the data back to consistency. The trouble with this approach is that in order to restart the failed system, a substantial amount of CPU resources could be utilized. Consequently, this use of CPU resources could impose a significant disruption to the work that is already running on the operating system.
Accordingly, what is needed is a more efficient method and system for recovering the retained locks of the failed operating system. The method and system should be simple, cost effective and capable of being easily adapted to existing technology. The present invention addresses such a need.