Some distributed computing environments, such as Parallel Sysplexes, today provide a non-volatile shared storage device called the coupling facility, that includes multiple storage structures of either the cache or list type. These structures provide unique functions for the operating system and middleware products employed for the efficient operation of a Parallel Sysplex. For example, the cache structures provide directory structures and cross-invalidation mechanisms to maintain buffer coherency for multisystem databases, as well as a fast write medium for database updates. These are used by, for instance, the data sharing versions of DB2 and IMS, offered by International Business Machines Corporation, Armonk, N.Y.
The list structures provide many diverse functions. One such list structure function is to provide for high-performance global locking, and this function is exploited by such products as the IMS Resource Lock Manager (IRLM) and the Global Resource Serialization (GRS) function in OS/390, offered by International Business Machines Corporation, Armonk, N.Y. Another list structure function is to provide a message passing mechanism with storage for maintaining multiple messages on a per system basis and a mechanism for notifying a system of the arrival of new messages. This function is exploited by the XCF component of OS/390, which in turn is exploited by numerous multisystem applications for providing a capability to pass messages between their various instances. A third list structure function is to provide for shared queue structures that can be ordered and accessed by LIFO/FIFO ordering, by key, or by name. Workload Manager (WLM), IMS Shared Message Queues and MQ Series, all offered by International Business Machines Corporation, Armonk, N.Y., are examples of exploiters of this feature. While these functions provide examples of the list structure uses, other uses exist.
Various components of a Parallel Sysplex have been documented in numerous applications/patents, which are listed above and hereby incorporated herein by reference in their entirety. The capabilities defined in some of those patents provide the basic system structure to create and manage cache and list structure instances. Additionally, various of the applications/patents listed above provide extensions to the base functions of the Parallel Sysplex.
In order to increase the robustness of coupling facility structures, over the years, various processes have been introduced that enable coupling facility structures to be rebuilt either for a planned reconfiguration or in response to a failure. Examples of these rebuild processes are described below:
User-Managed Rebuild
User-managed rebuild allows the operating system to coordinate a structure rebuild process with the active connected users of the structure, in which those connectors participate in the steps of allocating a new structure instance, propagating the necessary structure data to the new structure, and switching over to using the new structure instance.
User-managed rebuild provides both a planned reconfiguration capability and, in most cases, a robust failure recovery capability for coupling facility structure data, but often requires prodigious amounts of support from the structure connectors (adding to the overall cost of exploiting the coupling facility to provide data sharing functions). Furthermore, in some cases, it is impossible or impractical for the structure connectors to reconstruct the structure data, when it is lost as a result of a hard failure (such as a coupling facility failure or structure failure). This is particularly true when the structure is lost in conjunction with the simultaneous loss of one or more of the active connectors to the structure, where the connectors' protocol for rebuilding the structure requires each of the active connectors to provide some portion of the data in order to reconstruct the complete contents of the structure that was lost. In such cases, user-managed rebuild does not provide a robust failure recovery capability.
User-managed Duplexing Rebuild
User-managed duplexing rebuild allows the operating system to coordinate a structure rebuild process with the active connected users of the structure, in which those connectors participate in the steps of allocating a new structure instance, propagating the necessary structure data to the new structure, but then keeping both structure instances allocated indefinitely. Having thus created a duplexed copy of the structure, the connectors may then proceed to duplex their ongoing structure updates into both structure instances, using their own unique serialization or other protocols for ensuring synchronization of the data in the two structure instances.
User-managed duplexing rebuild addresses the shortcoming noted above for user-managed rebuild, in which it may be impossible or impractical for the structure exploiters to reconstruct the structure data when it is lost as a result of a failure. With user-managed duplexing, the exploiter can build and maintain a duplexed copy of the data in advance of any failure, and then when a failure occurs, switch over to using the unaffected structure instance in simplex mode. User-managed duplexing rebuild thus provides a robust failure recovery capability, but it does not address (and may in fact aggravate) the problem of requiring prodigious amounts of exploiter support from the structure connectors. Note also that user-managed duplexing is limited to cache structures only; list and lock structures are not supported.
System-Managed Rebuild
System-managed rebuild allows the operating system to internalize many aspects of the user-managed rebuild process that formerly required explicit support and participation from the connectors. In this processing, the operating system internally allocates the new structure and propagates the necessary structure data to the new structure, then switches over to using the new structure instance.
System-managed rebuild is only able to propagate the data to the new structure by directly copying it, so that system-managed rebuild provides only a planned reconfiguration capability; it is not capable of rebuilding the structure in failure scenarios, and thus, does not provide a robust failure recovery mechanism. However, by internalizing many of the “difficult” steps in the rebuild process into the operating system and taking them out of the hands of the exploiters, system-managed rebuild greatly simplifies the requirements on the structure exploiters, drastically reducing the development and test cost for the exploiters to provide a planned-reconfiguration rebuild capability.
Based on the foregoing, a need still exists for a system-managed duplexing capability. That is, a need exists for a duplexing capability that is managed by the operating system and largely transparent to the users of the system. Further, a need exists for a duplexing capability that enables duplexing of the various types of coupling facility structures, including cache, list and lock structures.