The present invention concerns database systems and particularly recovery mechanisms for multilevel systems.
Sophisticated database systems provide multiple users access to common data. The actual computer time required for all of the individual operations in a transaction that a given user performs on a database is typically much less than the time between the user's requests for transactions. Moreover, a transaction may be so structured that delays occur between its individual operations. If parts of the database transactions of different users are interleaved, therefore, the wasted storage-system idle time can be reduced, as can the delay experienced by a user because of other users' transactions. The degree to which such interleaving can occur is often referred to as the system's concurrency.
Much of the lowest-level software, or kernel, of a database system is directed to scheduling operations so as to achieve some degree of concurrency. But the degree to which the system can maximize concurrency greatly depends on the higher-level database software, i.e., on the software that defines the types of transactions that the ultimate users will be able to request. Specifically, some concurrency loss results from the fact that the transaction-defining software ordinarily requires that its transactions be serializable, i.e., that the effect of a transaction whose constituent operations are interleaved with another transaction's operation be the same as that of the same transaction performed by itself in a non-interleaved fashion.
To achieve this serializability, the transaction designer specifies that resources accessed by given operations of the transaction be locked, i.e., be unavailable to other transactions' operations until some or all of the first transaction's operations have been completed. The kernel implements the locks by keeping track of locked resources. For the duration of the lock, therefore, other transactions are delayed to the extent that they require access to the locked resources.
Although the resultant loss in concurrency is undesirable, it is usually acceptable in databases that implement only relatively simple transactions, such as transferring money between bank accounts, in which individual locks are rarely held for very long. This is particularly true if no part of the database contents is used so intensively that demand for concurrent access to it is frequent. But the concurrency of a system that implements more-complicated transactions, such as those employed for computer-aided design, can suffer greatly.
For this reason, proposals such as that in Weikum et al., "Multi-Level Recovery," Proc. 9th Symposium on Principles of Database Systems, have been made to implement multi-level transactions in such a way as to employ less-restrictive locking. To understand the concept of multi-level transactions, it is beneficial first to review the relationship among operations, transactions, and levels or layers.
At the lowest level, an operation is typically the smallest action that can be taken on the hardware that embodies the non-volatile memory of the database system, so it is inherently serializable: the hardware is not capable of starting any other lowest-level operation before the first is finished. (Those familiar with pipelined accesses to disk controllers will recognize that there is a sense in which this is not entirely true, but it sufficient for present purposes that it appears to be true from the outside.) An example is the reading or writing of disk blocks in disk-drive controllers that provide access in blocks. If the drive writes any part of a block, it writes the whole block--i.e., writing a block is atomic--and it finishes writing the whole block before it begins access to any other block: writing a block is serializable.
A subroutine comprising a group of such operations is not inherently atomic or serializable, but it can be made so through the use of locks whose object resources are data blocks. A transaction is such a serializable subroutine. The concept of layers or levels enters the picture if a transaction itself comprises a plurality of transactions. In such a nested organization, not only is the transaction serializable with respect to other transactions, but its constituent "subtransactions" are serializable with respect to each other. The transactions can thus be seen to have a layered structure in which a transaction at one (lower) level comprising operations at that lower level can be thought of as an operation at the next-higher level, i.e., at the level of a transaction of which the lower-level transaction is a part. A plurality of lowest-level ("L.sub.0 ") operations that constitute an L.sub.0 transaction form an L.sub.1 operation, which can in turn be part of an L.sub.1 transaction that constitutes an L.sub.2 operation.
Different levels typically deal with different levels of abstraction. At L.sub.0, for instance, a transaction may be to subtract an amount from a record in a block explicitly specified by an argument of one transaction and add that amount to a corresponding record in another block similarly specified. The transaction designer--i.e., the implementor of the higher-level database software--should make such a transaction atomic if, for instance, it represents shifting money from one bank account to another, because the bank's books are "inconsistent" when they are in the state in which one part of the transaction has been performed without the other. Therefore, no transaction should be able to "see" the blocks involved until the L.sub.0 transaction is complete.
A given higher-level transaction, on the other hand, might be the transfer an argument-specified amount from the account of argument-specified customer X to that of argument-specified customer Y. Such a transaction does not explicitly identify the disk blocks involved. Instead, it consists of a number of operations, typically organized into various levels of subtransactions, that access blocks containing, e.g., index information in the form of pointers to the blocks in which the information concerning X's and Y's accounts are currently stored and then access the blocks containing the account information. The account information thus must in some sense "stay put" if the transaction is to have the intended result. This, of course, is the purpose of locks. But a large number of data blocks can be accessed in a single transaction if the transaction is complex, as nested transactions tend to be, and many of the accessed blocks may need to be locked. The concurrency penalty that results from the proliferation of locks to which nested transaction structures lend themselves can thus be quite burdensome.
Specifically, recovery considerations normally dictate that a lock stay in force for the duration of the transaction in which it is acquired. If the locks are implemented at only a single level--i.e., if the lock table is simply a list of blocks, their associated lock types, and the transactions in which they were acquired--then a sufficiently complex transaction comprising multitudes of transactions nested to many levels can tie up a very large number of blocks for long periods of time.
This concurrency penalty can be reduced by implementing locks at higher levels of abstraction. Instead of requiring that a given block not be accessed, for instance, a high-level lock might require that X's account record not be accessed. Such locks tend to cause less of a concurrency penalty, because a high-level lock imposed ("acquired") by a high-level operation (i.e., by a lower-level transaction) replaces the lower-level locks acquired on the occurrences of the constituent operations of that lower-level transaction; the lower-level locks do not need to remain until the end of the highest-level transaction of which they are parts.
For example, although constituent parts of a high-level transaction may have instituted locks of a large number of blocks in performing their parts of the (higher-level) account accesses, those blocks are freed up at the ends of their respective lower-level subtransactions when they are replaced by the higher-level locks that those subtransactions institute. Therefore, a given high-level transaction prevents a concurrently operating transaction from accessing, say, only X's account record until the end of the given transaction; it is only for the (comparatively short) durations of various subtransactions that a concurrent transaction is prevented from accessing the various blocks that information concerning the account occupies during those subtransactions. Conceivably, therefore, concurrent transactions can proceed during the given transaction, even if to do so results in moving X's account information between blocks on different storage devices, so long as those concurrent transactions do not change the substance of that account information. This is not in general possible if only single-level locks are employed.
Adoption of such truly multi-level systems does require some effort; the transaction designer must not only design the higher-level locks but also high-level "undo" transactions for all of the high-level transactions. But such effort can eliminate the greater effort that might otherwise have to be invested in various ad hoc approaches to increasing concurrency. The additional design effort required for undo-operation and lock design therefore has not been the main deterrent to adoption of multi-level operation. The main deterrent has heretofore been the difficulty of obtaining a database kernel that provides a general method of recovery for such systems.
Database systems require some mechanism for recovering from transaction aborts caused by user intervention or processor "crashes." The database kernel typically provides the basic part of the recovery mechanism by maintaining in the system's non-volatile memory enough information about past operations to enable the completed operations of the interrupted transactions to be "undone" so that the results of only completed transactions remain. In the case of single-level transaction structures or those that employ nested transactions but only single-level locks, recovery can be accomplished by "undoing" all lowest-level operations that belong either directly to an incomplete lowest-level transaction or indirectly to an incomplete higher-level transaction. This is possible because the general approach of such transaction structures is to hold all locks until the ends of the highest-level transactions to which they directly or indirectly belong; an "undo" operation for a given operation can return a block to its prior state because the retained locks have prevented operations in other transactions from making other changes in the block. (Actually, some systems create exceptions to this "strict two-phase" locking rule to avoid its concurrency penalties in special situations, but, being exceptions, such departures from the basic rule complicate the recovery system and tend to reduce flexibility.
In a truly multi-level system, however, it is not possible simply to undo all of the lowest-level operations, because replacement of the lowest-level locks by higher-level locks may have permitted other transactions to make further changes on the blocks involved after the lowest-level locks were removed. In principle, this is not a problem, because high-level undo operations can be employed; a high-level undo operation can change X's account balance to its previous value even though it does not return the relevant information to the block that it previously occupied, and this is all that is required. In practice, however, general-purpose mechanisms for implementing the high-level undo operations have been difficult to implement and inconvenient to use. The Weikum et al. approach, for instance, requires a separate operation log for each abstraction level, restricts write operations to the disk during a lowest-level transaction, and requires extraordinary measures to redo incomplete lowest-level transactions. Therefore, although many designers now implement nested transactions, multi-level recovery has not yet been widely adopted.