1. Field of the Invention
This invention relates to a new system configuration and method for operating a computing apparatus. More specifically, it relates to the use of a recovery log for restarting a computing system following normal or abnormal termination.
2. Description of the Prior Art
In the operation of computing systems, it is the practice to provide a programming subsystem which includes a plurality of resource managers. Such resource managers control the operation of system resources (hereinafter also called resource collections) such as data bases, teleprocessing or other communication facilities, and the system itself. Further, there may be a plurality of data base resource managers managing separate or shared data base facilities, a plurality of teleprocessing resource managers, and a plurality of system resource managers--such as in a multi-programming, multi-processor environment.
It is, unfortunately, a characteristic of computing systems that failures occur which cause the abnormal termination of the system, the communication links, the data bases, and/or their managers. Such failures may leave, for example, a data base in an inconsistent state, or require that other resources or their managers not directly affected by the failure also suspend operation.
In order to facilitate the recovery of resource managers and the facilities or objects which they manage, it is known to write (hereinafter also called externalizing) at specified processing points, a system log containing the states of the resource collections (hereinafter called checkpointed states) to non-volatile storage together with before and after images of changes made to the resource collections. Examples of systems using such a system log are the IBM Information Management System (IMS/VS), Program No. 5740-XX2, and Customer Information Control System (CICS/VS), the latter being described in CICS/VS Version 1.5 System/Application Design Guide, SC33-0068, at pages 237-246. In these systems, during an emergency restart operation, the resource managers use the information in the log to perform their respective recovery responsibilities such as restoring the data base to a consistent state, reestablishing control block content, and backing out the effect of interrupted work unit activity on resources. However, there is no provision for restarting a subset of the resource managers, nor to defer the recovery of selected work units due to the unavailability of certain resources.
Consequently, there is a need in the art for the capability to restart all or a subset of the resource manager components of a subsystem, and all or a subset of the subsystem's resources. When components or the resources they manage are not available to the restart process, and have been made inconsistent by the actions of interrupted work unit activity, there is a need for mechanisms to remember these outstanding work unit recovery requirements until the components or resources become available.