In certain computer processing configurations, multiple systems are coupled together to execute workloads. One such system is a S/390 parallel sysplex configuration available from International Business Machines Corporation of Armonk, New York. The coupling of the multiple systems is achieved with shared direct access storage devices (DASD), and a shared global cache which is termed the coupling facility. Access to the shared global cache is significantly faster than access to shared DASD.
With reference now to Prior Art FIG. 1, a schematic diagram of a conventional parallel sysplex configuration 100 is shown. Parallel sysplex configuration 100 includes three systems, system 1102, system 2104, and system 3106. Each of the three systems is coupled to a shared DASD 108. In addition, parallel sysplex configuration 100 includes two coupling facilities, coupling facility 110 and coupling facility 112. As noted in Prior Art FIG. 1, coupling facility 110 is referred to as the primary coupling facility and coupling facility 112 is referred to as the alternate coupling facility. Each of the two coupling facilities are coupled to each of the three systems, system 1102, system 2104, and system 3106. Although a particular conventional parallel sysplex configuration 100 is shown in Prior Art FIG. 1, the following discussion pertains to parallel sysplex configurations having various implementations including a lesser or greater number of elements. Parallel sysplex configuration 100 is designed for high availability. That is, a loss of a hardware element in parallel sysplex configuration 100 will reduce workload resources, but the workload continues to execute without disruption. Hence, should one of the multiple systems (e.g. system 1102) of parallel sysp0lex configuration 100 experience some type of failure or become inoperable, the various other systems (e.g. system 2104 and system 3106) will still function. As a result, tasks being performed prior to the failure of system 1102 will continue to be performed using at least some of the remaining systems of parallel sysplex configuration 100.
With reference still to Prior Art FIG. 1, each coupling facility's physical storage is divided into units termed cache structures. A cache structure is associated with a specific application, for example VSAM RLS (virtual storage access method record level sharing), DB2, etc. A given application such as VSAM RLS may have multiple cache structures residing in the same or different coupling facilities.
As an overview, during operation data is buffered locally in each system's memory (local cache). Local cache is illustrated for system 1102, system 2104, and system 3106 as local cache 114, local cache 116, and local cache 118, respectively. Access to the system's local cache is significantly faster than access to the shared global cache (coupling facility 110 and coupling facility 112), or shared DASD 108. When a data item is read from shared DASD 108, a copy is placed in both the local cache and the global cache. The coupling facility is vital to this operation because the coupling facility provides faster access to the data item. More specifically, associated with the data item is a vector index. If the data item is changed, old copies of the data item are invalidated by the coupling facility by changing a bit associated with the vector index from “valid” to “invalid”. Hence, the operation of the coupling facility is essential to a parallel sysplex configuration.
Unfortunately, conventional parallel sysplex configuration 100, and, more particularly, coupling facility implementation therein, has significant drawbacks associated therewith. As an example, typically two coupling facilities 110 and 112 are employed to ensure availability even if a coupling facility outage occurs. Currently, when a coupling facility fails (e.g. primary coupling facility 110), a rebuild process is used to perform the recovery action. The rebuild process involves allocating reserved space in the alternate coupling facility (e.g. alternate coupling facility 112) at the time of the failure for each cache structure in the failed coupling facility. Therefore, to ensure that this rebuild process will be successful, and to ensure that workload performance will not be significantly degraded once the switch to alternate coupling facility 112 is complete, alternate coupling facility 112 must have reserved space (“white space”) for the rather rare event of a coupling facility failure. Hence, a conventional parallel sysplex configuration 100 has a backup or alternate coupling facility to ensure expedient and accurate recovery from a coupling facility failure, and the alternate coupling facility is used primarily to reserve white space.
With reference still to Prior Art FIG. 1, in a conventional parallel sysplex configuration 100, the primary coupling facility 110 and the alternate coupling facility 112 typically have approximately the same amount of data storage space. That is, alternate coupling facility 112 must have approximately the same amount of space (e.g. cache structure 122) as the primary coupling facility (e.g. cache structure 120) so that an amount of white space can be reserved in alternate coupling facility 112 wherein the reserved white space is approximately equal to the amount of assigned space in primary coupling facility 110. Thus, in a conventional parallel sysplex configuration, at least one alternate coupling facility sits mostly unused, containing allocated white space, and waits for the primary coupling facility to fail. Although high availability is important, such a duplication of coupling facilities drastically increases equipment expenses, significantly increases the physical space occupied by the coupling facilities, and “wastes” potentially valuable data storage space due to the allocation of white space in at least one of the coupling facilities.
Referring now to Prior Art FIG. 2, a block diagram 200 illustrating the aforementioned drawbacks associated with coupling facility implementation in a conventional parallel sysplex configuration is shown. As shown in FIG. 2, for purposes of illustration, primary coupling facility 110 has two cache structures cache 202 and cache 204. Additionally, in FIG. 2, alternate coupling facility 112 has two cache structures cache 206 and cache 208. In a conventional parallel sysplex configuration cache structures cache 206 and cache 208 remain allocated as white space and are reserved for the rather rare event of a failure of coupling facility 110. Hence, cache structures cache 206 and cache 208 remain unused.
As yet another concern, in order to achieve widespread acceptance, and to ensure affordability, any method of recovering from a coupling facility failure, which overcomes the above-listed drawbacks, should be compatible with existing parallel sysplex configurations.
Thus, a need exists for a method and system for recovering from a coupling facility failure. Still another need exists for a method and system which meets the above need and which does not require allocating white space in a separate and duplicative coupling facility. Yet another need exists for a method and system which meets the above needs and which is compatible with an existing parallel sysplex configuration.