1. Field of the Invention
This invention relates generally to concurrent multiuser database processing systems coupled to a shared external database store and particularly to an efficient "force-at-commit" destaging procedure for maintaining data consistency in a multisystem DBMS shared disk environment.
2. Description of the Related Art
Many modern Data Base Management Systems (DBMSs) follow a "no-force-at-commit" policy in a single multiuser database processing system environment to reduce the overhead of database transfers from cache memory to stable external storage. This can be accomplished by asynchronous destaging in batch mode rather than forcing local cache pages to disk at each transaction commit. However, preservation of data consistency among multiple DBMS instances where each database relies on simple recovery techniques in a multisystem environment requires each DBMS to follow a "force-at-commit" protocol whenever updating a database that experiences intersystem read/write interest. Force-at-commit policy in a multisystem shared disk environment herein implies that all data pages updated by a transaction are first written to the shared external (stable) store and then invalidated in every other concurrent multiuser system exhibiting interest before release of transaction locks. The extended persistence of transaction locks in these circumstances strongly reduces system efficiency.
The method of this invention is practiced in the multisystem environment described in copending patent application Ser. No. 07/860,805 filed on Mar. 30, 1992, by D. Elko et al., now U.S. Pat. No. 5,537,574, as "Sysplex Shared Data Coherency Method", (Assignee Docket PO9-91-052), assigned to the Assignee hereof and entirely incorporated herein by this reference. Elko et al. describe a Shared Electronic Storage (SES) facility, which includes a reliable nonvolatile electronic store that can be coupled to multiple systems for sharing data. The combination of a SES and a shared disk store is herein denominated an External Storage System (ESS). A DBMS instance can perform fast-writes of modified data pages to the SES under a "force-commit" protocol for later destaging to disk without waiting on disk actuator latency.
Such a procedure, employed to reduce the overheads of global locking, is described in U.S. patent application No. 07/869,267 filed on Apr. 15, 1992 by Josten et al. as "Efficient Data Base Access Using a Shared Electronic Store in a Multi-system Environment With Shared Disks", now U.S. Pat. No. 5,408,653, commonly assigned to the Assignee hereof and entirely incorporated herein by this reference. Josten et al. describe a protocol whereby, with no intersystem interest in a database, a DBMS follows a "no-force-at-commit" policy permitting it to write database updates to external storage asynchronously ("batch" mode) to transaction commit processing. This improves transaction response time and reduces the global lock hold time for better concurrency. Alternatively, when a buffer manager (BM) detects intersystem interest in a database, a "force-at-commit" policy is used to maintain coherency. This force-at-commit policy requires the DBMS, before releasing locks on data pages that are updated by a committing transaction, to ensure that the data pages are written to external storage and that all other "interested" systems are notified of the changed pages. The alternate force-at-commit protocol reduces forcing overhead substantially by initially externalizing these "dirty" data pages to SES, to be written to disk in a separate process, thereby compensating somewhat for the increased commit overhead required to maintain coherency with intersystem interest.
A typical transaction recovery protocol suitable for such a multisystem DBMS shared disk environment can be appreciated with reference to "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging", C. Mohan et al., Research Report RJ-6649, Revised Nov. 2, 1990, International Business Machines Corporation, which is entirely incorporated herein by this reference. To meet transaction and data recovery guarantees, the "write-ahead" logging (WAL) recovery system records in a log the progress of a transaction and all of its actions that cause changes to recoverable data objects. The recovery log becomes the source for ensuring either that the transaction's committed actions are reflected in the (stable) database despite various types of failures or that its uncommitted actions are undone (rolled back). When the logged actions reflect data object content, then these log records also become the source for reconstruction of damaged or lost data. Conceptually, the log can be thought of as an evergrowing sequential file, the nonvolatile version of which is stored in "stable storage", usually in disk systems. As used herein, stable storage means nonvolatile storage that remains intact and available across system failures and includes various combinations of disk and nonvolatile store (NVS) cache memory.
Whenever recovery log records are written, they are first placed in "volatile" storage, which usually denominates cache memory. Only at certain times (such as at transaction commit time) are the recovery log records up to a certain point written in log page sequence to stable storage. This is herein denominated "forcing" the recovery log up to the certain point. The WAL protocol requires recovery log records representing changes to selected data pages to be in stable storage before the changed data pages in local cache memory are allowed to replace the previous data page versions in stable storage. That is, the system is not permitted to write an updated page to external storage until at least the "undo" portions of the recovery log records describing the page updates have been first written to stable storage. Transaction status is also stored in the recovery log and no transaction can be considered complete until its committed status and all of its log data are safely recorded in external stable storage by "forcing" the recovery log up to the transaction's commit log record serial number. This requirement permits a restart recovery procedure to recover any transactions that complete successfully but whose updated pages are not physically written to external storage before system failure. Such feature implies that a transaction is not allowed to complete its "commit" processing until all recovery log records for that transaction are safely in stable storage.
Multiple concurrent DBMS instances using the shared disk (data-sharing) architecture is one approach to improving capacity and availability over a single DBMS instance. In a shared disk environment, all the disks containing the database are shared among the different DBMS instances (or systems). Every system may read and modify any portion of the database on the shared disks. Because each DBMS instance has its own buffer pool and because conflicting accesses to the same data can arise simultaneously from different systems, special synchronization protocols are required to govern intersystem interactions. These protocols require global locking facilities and procedures for maintaining coherency in the local cache buffers (LCBs) in the different systems.
A transaction system, such as International Business Machine Corporation's DB2 system, that does not write an updated page to disk at transaction commit has the current version of a page in its buffer pool. In a shared disk environment, each sharing DBMS instance has its own buffer pool. Thus, when a system requests a page whose current version is cached in another system (denominated the owner), the owner system must provide the page to the requester. Various techniques for meeting such requests without long delays are disclosed by C. Mohan et al. in a paper entitled "Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment", Research Report RJ 8017, Mar. 15, 1991, International Business Machines Corporation. However, Mohan et al. require each page to be written to disk for simple recovery purposes because no SES capability is assumed.
Reference is also made to U.S. patent application No. 07/955,076 filed on Oct. 1, 1992 by C. Mohan et al., now U.S. Pat. No. 5,455,942, assigned to the Assignee hereof, and entirely incorporated herein by this reference. Reference is made to issued U.S. Pat. Nos. 5,276,835, 5,280,611, and 5,287,473 issued to C. Mohan et al. and commonly assigned to the Assignee hereof, all fully incorporated herein by this reference.
Because the system disclosed in the above-cited Josten et al. patent application employs two distinct commit forcing protocols, there is a clearly-felt need for an efficient method for determining the pages modified by a transaction, for scheduling and performing their destaging writes to external storage, and for cross-invalidating their cached copies in any other systems showing "interest". Such "dirty-page" identification and scheduling procedures are necessary because protocol selection in the Josten et al. sysplex occurs dynamically between "force" and "no-force" strategies in response to unpredictable "interest" in the subject database by other systems. The "force" protocol requires transaction level tracking of dirty pages, although the "no-force" protocol does not.
Because commit processing occurs very frequently in a high transaction rate environment and because "force-at-commit" policies may govern selection and scheduling of buffer pool pages to external stable storage, there is a clearly-felt need for an efficient method for processing a transaction's updated pages for externalization in a multisystem DBMS with shared external storage environment. The related unresolved problems and deficiencies are clearly felt in the art and are solved by this invention in the manner described below.