Field of the Invention
This invention relates to methods for concurrent database access by multiple transactions and queries in general, and more specifically, to techniques for eliminating transaction waiting during read-only queries by maintaining multiple versions of records.
Description of the Related Art
The simultaneous or concurrent execution of transactions that update a few data more or less randomly and transactions that access large numbers of data is a well-known problem in the database systems art. A transaction may read from the database, write to the database, or both. Because accessing transactions must lock records to ensure consistency, locking conflicts can arise between concurrent transactions. Upon such a conflict, one of the transactions must wait. The locking conflict is logical and the database management system solution is limited to serializing the transactions. However, other solutions to the wait-on-conflict problem are possible if the extensive transaction is limited to a read-only transaction or query.
When several transactions execute concurrently in the same database, the consistency of data may no longer be preserved. Thus, the system must control the interaction among the concurrent transactions. This control is achieved through a concurrency control procedure that exploits and guarantees transactional consistency. Because a transaction is a unit that preserves consistency, any schedule produced by a concurrency control scheme for processing a set of transactions concurrently must be computationally equivalent to a schedule produced by executing the same transactions serially in some order. This property is usually guaranteed by maintaining serializability in a manner known in the art.
To ensure serializability, any of a variety of different concurrency control methods may be used. In general, the concurrency control methods known in the art ensure serializability by either delaying an operation or aborting the transaction that issued the operation. The classes of methods well-known in the art include "locking protocols", "timestamp ordering" schemes, "validation" techniques, and "multi-version" schemes. For a survey of the related art, reference is made, for example, to H. F. Korth, et al, "Database System Concepts", Chapter 11, Concurrency Control, McGraw-Hill, New York, 1986.
Selection of a concurrency control scheme should reflect consideration of the desired balance of several conflicting criteria. The selected method must guard against cascading rollbacks during recovery without reducing system parallelism more than absolutely necessary. For instance, a typical multi-version concurrency control scheme assumes that each "write" operation creates a new version of the updated record. When a "read" operation is issued to the updated record, the system selects one of the record versions to be read. The concurrency control scheme must ensure that the version selection is made in a manner that guarantees serializability. This can be accomplished through the use of timestamps, for instance. In such a system, a read operation always succeeds while a write operation may result in transaction rollbacks.
The application of such multi-version techniques is useful for reducing the locking required to ensure serializability of concurrent transactions. Any locking reduction will beneficially reduce transaction wait states and thereby increase system parallelism. Thus, the art is replete with techniques for locking reduction, including other multi-version approaches.
R. Bayer, et al disclose a complex multi-version support mechanism (Bayer, R., Heller, H., and Reiser, A.,
"Parallelism and Recovery in Database Systems", Transactions on Database Systems, Vol. 5, No. 2, Jun. 1980). Bayer, et al require a continuous and costly maintenance of a graph that tracks the inter-transaction dependencies to avoid both non-serializable transaction executions and deadlocks. Bayer, et al require even read-only transactions or queries to lock the data objects being read, although the read-locks are granted without wait states. Query transactions also incur the cost of analyzing the dependency graph to locate the object version that must be read. Sometimes non-query updating transactions may be rolled back to preserve data atomicity. Read-only queries are never rolled back.
Because only two versions of any data item are maintained by Bayer, et al, the commit of an updating transaction may be delayed by a read-only query that is actively reading the earlier record version targeted for update. The Bayer, et al approach also incurs additional lock-related overhead for updating transactions compared to the overhead required in non-multi-version concurrency control schemes. They do not consider space management, structures required to track locations of different object versions, partial rollbacks, and incremental versioning, which are all problematic issues known for multi-versioning concurrency control schemes.
D. Reed proposes the use of timestamps for synchronization (Reed, D., "Naming and Synchronization in a Decentralized Computer System", PhD Thesis, Technical Report MIT/LTS/TR-205, MIT, Sept. 1978). Reed's method requires all read-only queries to update the timestamp control information associated with the data objects being read. His method permits creation of an unlimited number of object versions, thereby raising potential space management problems. Reed does not consider the garbage collection problem. Reading of records may be delayed and updating transactions may be aborted to avoid serializability violations.
Stearns, et al propose a similar method that may block or abort read-only queries under some circumstances and may delay the committing of updating transactions until termination of read-only queries accessing previous object versions (Stearns, R. E., Rosenkrantz, D. J., "Distributed Database Concurrency Controls Using Before-Values, Proc. SIGMOD International Conference on Management of Data, Ann Arbor, Apr. 1981).
A. Chan, et al propose a method that permits creation of any number of object versions, thereby raising potential space management problems (Chan, A., Fox, S., Lin, W-T., Nori, A., and Ries, D., "The Implementation of an Integrated Concurrency Control and Recovery Scheme", Proc. SIGMOD International Conference on Management of Data, Orlando, Jun. 1982.). Chan, et al provide versioning at the page level, thus requiring the transfer of an entire page to a slot in their "version pool" even if only a small part of the page is changed. Besides the path length overhead, this approach unnecessarily consumes buffer and disk space. Moreover, if a read-only query accesses a logical page having an uncommitted version, then the query must search at least one additional page before locating the committed page version needed, thereby increasing I/0 overhead. This occurs because their different page versions are backchained and each version is labelled with the identifier code of the creating transaction. A read-only query is required to read the most recent version of the page created by an updating transaction that is committed at the time the read-only query has begun. Because of this, every new read-only query must be associated with a Committed Transaction List ("CTL").
This Chan, et al page level versioning method uses a "version pool", guaranteeing that clustered access to physically contiguous pages for read-only queries, especially long ones, is not possible. Their garbage collection method may waste space in "version pool" and every updating transaction must track the slots used in the "version pool". Chan, et al clearly support only page level locking in their versioning scheme. Page level locking, especially for index data, normally leads to an intolerably low level of concurrency. Chan, et al do not actually discuss how their versioning is done for index data. Their method requires all modifications made by an updating transaction to be forced to disk at commit time. Because versioning is being done at the page level and because the before images of modified records are not logged, the previous version of the modified page must also be forced to disk before the modified version of the page with uncommitted changes is put on disk. These are a costly operations.
Chan and Gray later extend the Chan, et al scheme to the case of distributed read-only queries (Chan, A., and Gray, R., "Implementing Distributed Read-Only Transactions", IEEE Transactions on Software Engineering, Vol. SE-11, No. 2, 1985). The Chan and Gray algorithm causes the CTL of a given site to be transmitted in all precommit and commit messages sent by that site. Thus, read votes cannot be used to avoid the second phase of commit processing as is possible in other distributed database management systems known in the art (e.g., R* DDBMS). Sites that receive CTLs from other sites merge them with their own CTLs to create new versions of their own CTLs. To avoid aborting a read-only query because of premature garbage collection of the data needed by the query at the remote site, the set of retrieval sites that the ready-only query will visit must be known in advance and each of those sites must first be queried for its CTL before the query transaction starts. The union of the received CTLs must then be transmitted to all retrieval sites for use by the query to determine the data object versions to be read. Once the retrieval sites communicate their CTLs, they must be somehow prevented from garbage-collecting the data snapshot defined by the CTLs. This Chan and Gray algorithm incurs additional overhead to maintain CTLs in stable storage. These CTLs could become quite large. Chan and Gray do not suggest how premature garbage collection can be prevented or even detected.
Reference is also made to the disclosure by J. Robinson, et al of a technique for concurrency control using on-demand versioning to eliminate lock contention when an updating transaction must lock a record in a page required by a read-only query (Robinson, J., Thomasian, A., and Yu, P.S., "Elimination of Lock Contention in Relational Databases Accessed by Read-Only Queries and On-Line Update Transaction", I.B.M. Technical Bulletin 06-88, p. 180-85, Jun. 1988). Robinson, et al disclose a method wherein a page version containing the record accessed by a read-only query is prepared for the query and the global copy of the page is made available to the updating transaction. Versioning granularity can be varied and the version pages may be stored in a shared buffer pool along with other data. Different versions of the same page may co-exist in the buffer for queries started at different times. Robinson, et al do not consider the problems of efficiently controlling the various versions or of the efficient garbage collection of version data following read-only query termination.
S. Todd discloses a concurrency system that provides a deadlock-free environment for use in a distributed database system that tracks multiple versions of data pages (Todd S., "Concurrency System Suitable for Distributed Databases", I.B.M. Technical Disclosure Bulletin, 06-78, p. 383-386, Jun. 1978). Todd describes a method for labeling various page versions to indicate version status as replacement, lock or current. His method replaces the use of read locks with version control and permits all read-only queries to access the database without wait states.
In U.S. Pat. No. 4,627,019, Fred K. Ng discloses a versioning technique that stores an array of access blocks, each block defining the database location of a version of a relation, only one of which is defined as current. However, his technique requires a new access block in the access dictionary (defining a new database location) for every update transaction. Database access by each of a plurality of database transactions is permitted only through the relation block associated with that particular transaction. Ng's technique appears to be intended for use in systems with few updating transactions and many read-only queries.
In U.S. Pat. No. 4,853,843, Denise J. Ecklund discloses a system for merging virtual partitions of a distributed database where various database versions exist among the partitions. In U.S. Pat. No. 4,875,159, Richard W. Cary, et al disclose a version management system for synchronizing two versions in a multi-processor system. Both the Ecklund and Cary, et al techniques are intended for coordinating versioning control among independent processors in a distributed data processing system.
There is a clearly-felt need in the art to improve the optimization of tradeoffs between transaction concurrency and the delays arising from lock conflicts in concurrent processing systems. The related unresolved problems and deficiencies are clearly felt in the art and are solved by the present invention in the manner described below.