1. Field of the Invention
The present invention relates generally to transaction processing, and more particularly to a transaction processing system which uses locking as a concurrency control mechanism. Specifically, the present invention relates to a database system that uses lock escalation and de-escalation protocols.
2. Description of the Background Art
A desirable feature of a computing system is the ability to recover from partial system failures that may interrupt memory write operations. If an application program has a memory update operation in progress at the time of the system failure, it is possible that a memory record will become erroneous. To enable the recovery of memory records after a partial system failure, it is necessary for the database system to keep backup copies of the records in nonvolatile memory. When the computing system is restarted, the memory records to be recovered are replaced with the backup copies.
To facilitate the making of backup copies and the recovery of memory records, the database system typically provides an established set of logging and recovery procedures that can be invoked or called from an application program to define a "recovery unit." The recovery unit consists of a set of "before images" and a set of procedures for installing these "before images" to corresponding non-volatile data records. All of the "before images" in the "recovery unit" must be installed before the corresponding data records are made available for subsequent processing. The "before images" in the "recovery unit" usually are the updates of operations in a single "transaction." Upon recovering from a partial system failure, inspection of the nonvolatile memory will reveal that the operations in the single "transaction" are either all completed, or none of them are completed.
The operations in a single transaction may modify a number of files, and the files may be shared by other processes. During the transaction, the files may be inconsistent for a time, although the files will be consistent upon completion of the transaction. A typical example is a transfer of funds from one account to another, in which a first account is debited, and at a slightly later time, another account is credited. During the interim, the two accounts are inconsistent because the sum of the two accounts does not represent the total funds in the two accounts. Due to inconsistency when files are being modified by a transaction, it is desirable to prevent other users or processes from accessing the files until the modification is finished.
Transactions are typically initiated in transaction processing systems in such a way that the execution of a second transaction is begun before the results of a first transaction are committed. To ensure correctness and ease of recovery, the second transaction is usually precluded from reading any updates of the first transaction before the first transaction commits. In a data base system, for example, a transaction places "write locks" on any data base records that are modified by the transaction. To ensure consistency of data read by a transaction, the transaction may also place "read locks" on any data base records that are read by the transaction. These read locks and write locks are held until the end of the transaction. Just after the updates of the transaction are committed, the locks are released. This well-known two-phase locking protocol ensures correctness and ease of recovery as described in Bernstein et al., Concurrency Control and Recovery in Database System, Addison-Wesley, 1987, pp. 58-78.
In multi-processing database systems, such as the "Rdb/VMS" (Trademark) database system sold by Digital Equipment Corporation, a "lock manager" is used which maintains a lock data structure including a hash table index to a cache of locks. Before a record is fetched, the cache of locks is indexed in order to determine whether a record is already locked, and to lock a free record to be updated.
The RdB/VMS database system is described in Hobbs and England, Rdb/VMS--A Comprehensive Guide, Digital Press, Digital Equipment Corp., Maynard, Mass. (1991); and Ashok Joshi, "Adaptive Locking Strategies in a Multi-Node Data Sharing Environment," Proceedings of the 17th International Conference on Very Large Data Bases, IEEE, Barcelona, Spain, Sep. 3-6, 1991, pp. 181-192. The Rdb/VMS database system uses the "lock manager" of the "VMS" (Trademark) operating system sold by Digital Equipment Corporation. The VMS lock manager is further described in Snaman and Thiel, "The VAX/VMS Distributed Lock Manager," Digital Technical Journal, No. 5, Digital Equipment Corp., Maynard, Mass. (September 1987), pp. 29-44.
Lock managers typically support resource hierarchies in order to provide high concurrency as well as good performance. Coarse granularity locks reduce the locking overhead at the expense of concurrency. Fine granularity locks improve concurrency at the cost of increased locking overhead such as larger lock tables and more calls to the lock manager. To deal with these problems, locking protocols typically use techniques that dynamically adjust the granularity of locking. One technique, known as lock de-escalation, starts with coarse granularity, and refines the granularity in response to locking requests by conflicting users. Another technique, known as lock escalation, starts with the finest granularity, and when there are a relatively large number of fine grain locks, the fine grain locks are exchanged for a single lock at the next higher level in the resource hierarchy, so long as the exchange would not introduce conflict or deadlock.
The "Rdb/VMS" database system, for example, uses multigranularity locking techniques. Records within a table are grouped into a tree structure called the "adjustable lock granularity tree" (ALG). This tree organizes the records into varying levels of granularity starting with the root of the tree being the entire table and the leaves being the individual records. The number of levels in the tree, as well as the successive refinements of granularity at each intermediate level, can be defined by the data base administrator.
The "Rdb/VMS" database system uses the following lock de-escalation protocol. Whenever a record lock is requested, the lock protocol attempts to acquire a strong lock on the highest ancestor of the record in the ALG tree. If it succeeds in obtaining the strong lock, all descendants of that node are implicitly locked. When individual records are accessed, it is necessary to remember each record that has been accessed so that it is possible to later de-escalate the high level lock to a lower level, if necessary. If the amount of conflict increases, it is possible to perform de-escalation and acquire explicit record locks.
Lock escalation has also been proposed for use with multigranularity locking, as described in Bernstein et al., Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987, pp. 69-77. On page 75, Bernstein et al. observe that a system that employs multigranularity locking must decide the level of granularity at which a given transaction should be locking data items. Fine granularity locks are no problem, because the transaction manager or scheduler simply requests them one-by-one as it receives operations from the transaction. Coarse granularity locks are another matter. A decision to set a coarse lock is based on a prediction that the transaction is likely to access many of the data items covered by the lock. A compiler may be able to make such predictions by analyzing a transaction's program and thereby generating coarse granularity lock requests that will be explicitly issued by the transaction at run time. If transactions send high level (e.g., relational) queries to the transaction manager, the transaction manager may be able to tell that the query will generate many record accesses to certain files.
Bernstein et al. further say that the past history of a transaction's locking behavior can also be used to predict the need for coarse granularity locks. The scheduler may only be able to make such predictions based on the transaction's recent behavior, using a technique called escalation. In this case, the transactions start locking items of fine granularity (e.g., records). If a transaction obtains more than a certain number of locks of a given granularity, then the scheduler starts requesting locks at the next higher level of granularity (e.g., files), that is, it escalates the granularity of the locks it requests. The scheduler may escalate the granularity of a transaction's lock requests more than once.
Lehman and Carey have proposed to allow the granularity of locking to vary dynamically in response to changes in the level of inter-transaction conflicts. As described in Lehman and Carey, "A Concurrency Control Algorithm for Memory-Resident Database Systems," FODO June 1989, a proposed locking algorithm uses two locking granule sizes: relation-level locks and record-level locks. Locking at the relation level is much cheaper than locking at the record level, so it is the preferred method when a fine granularity of sharing is not needed. When several transactions desire access to a relation that is locked with a relation-level lock, the relation lock is de-escalated into a collection of record locks; the higher cost for record-level locking is then paid, but the level of sharing is increased. To allow for the possibility of relation lock de-escalation, record-level write sets and read predicate lists for transactions are kept in a control block associated with each accessed relation so that they may be converted into record locks if the need arises. When fine granularity locks are no longer needed, record-level locks are escalated back into relation-level locks. Certain operations that require the use of an entire relation will be able to force lock escalation to the relation level and then disable lock de-escalation until they have completed.