The present invention is directed to resource-management systems and in particular to the locking strategies that they employ.
A resource-management system is typically implemented in a computer, including its various types of storage apparatus, programmed with appropriate software. One type of resource-management system, namely, a database-management system ("DBMS") can be thought of as operating in two phases, the first of which may be called a design phase and the second of which may be called an execution phase.
FIG. 1 represents the design phase. In this phase, the DBMS 10 provides a database designer with ways of defining the "structure" of the data, i.e., the manner in which its storage is organized, and of defining transactions in which the data thus stored will be accessed by end users.
The first function, performed in what we can call a "metadata manager" 12, typically responds to structure-defining (data-definition) commands entered by the designer in a high-level database language such as SQL. A relational DBMS, for instance, may accept a data-definition instruction of the following form:
CREATE TABLE INCOME PA1 NAME CHAR (20), PA1 INC DECIMAL (10,2) PA1 SSN CHAR (9); PA1 CREATE INDEX ON INCOME (INC); PA1 START TRANSACTION PA1 SELECT INC PA1 FROM INCOME PA1 WHERE SSN=123456789" PA1 COMMIT WORK;
Such a statement may establish that the database will include a relation, or table, called "income," in which each tuple, or record, includes three attributes, or fields, including a "NAME" field in the form of twenty characters, an "INC" (income) field of ten decimal digits with a decimal point two places from the right, and an "SSN" (social security number) field in the form of nine characters.
The database designer may use similar statements to define other tables that the database will include. He may further define one or more indexes, whose maintenance he knows will be valuable in the use of the database. For instance, he may employ a command such as:
to create an index of the records ordered by the values of the respective records' income fields.
The result of such commands is to cause entries reflecting them in a database catalog 14 that the DBMS creates and maintains for use in its access operations. Note that none of the foregoing activities provides actual database contents; attributes have been named, but no values of those attributes have necessarily been entered.
We will assume for present purposes that it is the end users who supply the database contents by various manipulating transactions. However, it is the database designer who defines the types of transaction routines that the end users invoke for this purpose. (Actually, a separate application programmer may perform this task. From here on, we will refer collectively to the various personnel, other than the end users, who operate on the database as the "definer.") That is, the definer generates transaction routines, which the DBMS's run-time system will perform in response to requests from the end user. To produce these routines, the definer employs a group of DBMS modules to which we will refer as a "query compiler" 16.
The definer may write the transaction definition in a general-purpose language such as PL/1, but he will typically embed in the PL/1 program instructions written in the high-level database language, such as SQL, to which the DBMS responds. This can be done in PL/1, for instance, by preceding, say, SQL commands with "EXEC SQL," which is a signal that the overall PL/1 source code will have to be submitted to a precompiler before PL/1 compilation so as to strip out the SQL statements and replace them with appropriate subroutine calls. The stripped-out SQL statements would be presented to the query compiler, which would compile them into instructions that call upon the services of various operative modules within the DBMS's run-time supervisor.
A transaction-defining statement in the high-level database language may be in the form:
When the routine specified by such a statement runs, it searches the database for a record in which the SSN (Social Security Number) field is 123 45 6789 and fetches the value in the income field of that record.
To compile this command, the query compiler 16 consults the database catalog, which contains the implementer's definitions of the data organization, such as that "INCOME" is a relation and INC and SSN are two of its attributes. It also determines the best manner in which to search for the indicated information, and it employs the index information in order to do this. If, for instance, the definer has required that an index ordered by social-security number be provided, then the query compiler produces a transaction routine that accesses the record by way of that index. In such a situation, the SSN attribute is the "key" by which the system finds the desired record (or records).
It should be emphasized at this point that, although we depict the query compiler as operating in the design phase rather than in the execution phase, many of what will be described below as features of the query compiler are only implicit in most DBMS implementations until the execution phase is in progress. In particular, the typical query compiler places in the transaction routine 17 calls to operation subroutines chosen from a subroutine set resident only during the execution phase and commonly called by most transaction routines. Since the contents of these subroutines are all part of the translation from transaction definition to implementation instructions, we consider them features of the query compiler, and the remaining discussion will not distinguish between the functions that the query compiler performs directly during the design phase and those that its design-phase actions perform only indirectly, by incorporating calls to existing subroutines resident during the execution phase.
Among the capabilities of many DBMSs is that of maintaining what is known as "serializability." In writing transaction definitions, the database definer is defining what will happen when an end user submits a request to the system. One resulting transaction routine, for instance, might be invoked by a user at an automatic teller machine to transfer money from one account to another. Another might be invoked by a bank executive from a management-information-system terminal to request the total of all account balances for a given branch of the bank. In each case, the end user submits his transaction request to a run-time supervisor 18 (FIG. 2), which calls the specified transaction routine. The transaction routine obtains records from a storage medium 19, such as a magnetic disk or cached copies of its contents, through the operation of a buffer manager 20. The actual central-processor time involved in each of these transactions may be very small, but the time required between the beginning and end of the transaction may be considerable, since time is usually required to await data from storage facilities, inputs from the human user, etc. In the operation of most large databases, it is therefore important that the central processor be freed to perform operations of other transactions between the individual operations of any single transaction. But this interleaving can cause problems if steps are not taken to prevent them.
For example, the ATM user's transfer of money from one account to another may actually be implemented in a number of operations, which may be interspersed with inputs from the user. The transfer involves the separate steps of removing money from one account and adding it to another. Similarly, the bank executive's transaction of obtaining account totals may include a number of separate operations, each of which involves reading an account balance and adding it to a running total. If these operations are interleaved, the account-totaling operation may copy the balance from the ATM user's first account before the transfer and from the second account after the transfer, and this would indicate to the executive that the bank total is greater than it actually is.
This would not result, of course, if the two transactions occurred serially. One of the functions of most DBMSs is therefore to perform transactions in such a way that concurrently performed sets of them are serializable, i.e., that their results are the same as that of a series of nonconcurrent transactions, without requiring actual serial transaction ordering. To this end, the query compiler usually causes the transaction routine to include certain operations that invoke the services of a DBMS module known as a "lock manager" 21, which is active in the execution phase depicted in FIG. 2 (and actually in the design phase, too, for purposes not relevant here) and maintains a lock table 22, whose contents indicate which "resources" are currently involved in transactions in such a manner that certain operations on them by other transactions must be postponed until the previous transactions have been completed. That is, if a transaction performed by such a routine includes an access to a certain resource, it will also request that the lock manager post a lock in the lock table identifying the designated resource as one to which access is restricted. (We use the more-general term resources instead of records for reasons that will become apparent. Until those reasons do become apparent, however, little harm results from reading "records" for "resources.")
When an end user invokes a transaction routine during this execution phase, that routine will request that the lock manager post a lock on the required resource, and the lock manager will return to the routine an indication of whether such a lock is permitted. If not--because another transaction has already locked that resource--the transaction that has requested the lock will be at least temporarily prevented from proceeding. Otherwise, (except in certain "instant lock" cases that will be described below) the lock manager will enter a lock in the lock table and thereby restrict other transactions' access to the locked resource.
Of particular interest in the present context is what is known as "range locking." The serializability of some types of transactions is not assured by simply locking the several records to which they require access. An example of such a transaction is one that includes a scan operation, which accesses all records in which a certain attribute's values are within a specified range.
For instance, one transaction may be to find the total income within an income range. A concurrent transaction may be to insert records for a group of new taxpayers who first have reportable income within a given period. It is desirable that the computed total reflect either all or none of the new taxpayers, but individual record locking does not insure this result. If the insertion transaction inserts one record in a part of the range that the total transaction has already searched and another in a part that it has not, then neither transaction will encounter the other's locks, and the total-computing transaction's result will be erroneous. This is because of the "phantom" records that were inserted into a range after that range's original occupants were locked.
To remedy this problem, database management systems must lock entities other than just records. For example, a DBMS may lock a whole file when a scan operation is performed in it. But such an expedient can produce a considerable reduction in the system's achievable concurrency, i.e., in the degree to which operations of separate transactions can be interleaved. Alternatively, therefore, DBMSs sometimes employ range locking, which treats not only individual key values but also key-value ranges as resources to be listed in the lock table. Therefore, when a transaction routine includes an operation directed to all records within a certain target key range, it causes the lock manager to post locks directed not only to the individual key values to which the transaction will obtain access but also to a set of lockable ranges that covers the target range. And, if another operation is directed to a specific key value, that transaction not only requests a lock on that key value but also has the lock manager check for locks on any lockable range into which that key value falls. This latter function is performed by additionally identifying any such range resource and requesting a lock on it.
The routine for inserting records of new taxpayers, for instance, would identify all ranges into which the records that it intends to insert fall, and it would check the lock manager to determine whether there are locks on these ranges. Since the total-computing transaction will have acquired locks on ranges into which the new taxpayer records would otherwise have been inserted, the record-insertion transaction is postponed until the total-computation transaction has been completed, and serializability is thereby maintained.
We digress at this point to note that locking a key value or range thereof is not the same as locking the record or records that the key value or range designates. In requesting a lock at the beginning of a search by key value, the operation passes to the lock manager a "resource ID," which is typically a hashed or otherwise encoded version of the key value. It is this ID against which the lock manager posts a lock and for which it searches to determine whether the lock can be granted. Now, the record identified by the name-attribute value John Doe may be the same as that identified by the social-security-number-attribute value 123 45 6789. But the key-value locking performed by an operation that uses John Doe as its key value and uses (typically a hashed or otherwise encoded version of) that value as the resource identifier when it acquires a lock on that name does not by itself restrict access to that record by an operation that requests a lock on the social security number. Typically, of course, the operation will also perform further locking, such as record-ID locking, which will cause access to that record from other paths to be restricted. Indeed, one of the keys may in essence be the record ID. But we are not concerned with that here. Here we are concerned only with locking on the basis of a given key or range thereof, so we will refer to key-value or range locking, which may or may not be accompanied by or equivalent to record locking.
Although it is better than locking the whole file, locking ranges of key values can itself reduce concurrency significantly. To lessen this concurrency reduction, DBMSs employ more than one mode of lock, each lock mode differing from the others in restrictiveness. This restrictiveness variation is useful even if the DBMS does not use range locking. For example, a lock acquired by a transaction as a result of an operation that only reads records does not need to prevent other transactions from reading those same records, but a lock resulting from a write operation does. In recognition of this fact, a simple, two-mode locking system may employ lock-table entries that include an indication not only of whether a lock is in place but also of whether the lock is a share-mode ("S") lock, requested for read operations, or an exclusive-mode ("X") lock, requested by record-modifying operations. A transaction requesting an S lock will be deterred only by X locks on the target resource, not by S locks, while transactions that request X locks will be deterred by locks of both types.
Database systems that employ range locking often obtain additional concurrency by means of "multi-granularity locking," or MGL, which conventionally employs five lock modes, as FIGS. 3, 4, and 5 illustrate. FIG. 3 is a lock-mode table, which indicates the types of locks acquired by transactions in accordance with one application of the MGL scheme for the different types of database-access operations of which they may be comprised. In addition to the simple "covering" lock modes S and X, there are three "intention" lock modes, designated IS, IX, and SIX, whose purpose is to indicate, with respect to a key range, that a further, covering lock will be acquired on a key value that falls within that range. (Actually, the SIX mode, as will be seen below, is both a covering lock and an intention lock.)
FIG. 4 is an exemplary lock table that will be employed to explain intention locking. In the example, we will assume that one of the attributes in a relation defined for a given database is labeled "INC," for "income." That is, if the relation is thought of as a table, each row of the table represents a taxpayer, and one of the columns represents those taxpayers' incomes. Let us further assume that the database designer has required (not necessarily in principle but nearly unavoidably in practice) that an index be maintained of incomes and that he has further specified that the DBMS is to treat certain predetermined ranges of incomes as lockable resources. The reason for this may be that a purpose of the database is to support studies of characteristics of various predetermined income ranges.
Now, suppose that a user calls for performance of a transaction routine for totalling all incomes within the $40,000-$60,000 income range and that the database designer has specified two predetermined ranges of $30,000 to $50,000 and $50,000 to $70,000. The DBMS's run-time supervisor responds to this request by running the requested transaction routine and assigning this instance of that routine's operation the transaction name T.sub.1. Since transaction T.sub.1 will need to read all records within the two ranges but not to update any of them, it requests S locks on the two ranges, as the first row of the FIG. 4 table indicates. The S lock is a covering lock, since it implicitly locks each key value in the range. But the transaction does not explicitly place a separate lock on each key value--i.e., each income level--that it finds, since the query compiler will have so designed any transaction routine that similarly searches for records by income as to cause the lock manager to check the income range into which the indicated income falls.
The "read scan" entry in the FIG. 3 mode table reflects this lock-mode selection. It shows that, upon a scan-type read operation, i.e., one which requests all records within a given range, a lock of the S type is acquired on the range or ranges involved but not on the individual key values.
As FIG. 5 indicates, lock mode S is compatible with only two lock modes, IS and S. The latter compatibility is exemplified by a concurrent transaction ("T.sub.2 "). Transaction T.sub.2 requires access to the S-locked ranges, but only to read them, so it requests S-mode locks of those ranges. The lock manager, which implements the compatibility table of FIG. 5, informs T.sub.2 that the requested lock mode is compatible with the existing (S-mode) locks on those ranges in its lock table. It therefore posts the lock, as the FIG. 4 lock table indicates in the T.sub.2 row.
To observe compatibility of the S mode with the IS mode, let us consider a transaction ("T.sub.3 ") whose purpose is to fetch, say, the identity of the taxpayer whose income is $39,547. This involves a "singleton read" operation, i.e., one that targets an individual key value, not a range, although the key value may fall within a range. As FIG. 3 indicates, the query compiler requires such an operation's transaction to request not only an S lock on the actual key value to which this operation is directed but also an IS lock on any range in which that key values falls. Accordingly, as FIG. 1 shows, T.sub.3 acquires an S lock on income value $39,547 and an IS lock on income range $30,000 to $50,000. The lock manager grants this lock, even though locks have already been acquired on that income range, because the lock manager's compatibility matrix (FIG. 5) indicates compatibility between the IS and S locks.
The IS lock is not a covering lock: the operation that caused it is not directed to the range against which the IS lock is posted. Instead, it is an intention lock: it indicates the intention to acquire a lock on a different resource, namely, a key value, that the range includes (or, for some resources, otherwise overlaps). Its purpose is to prevent another transaction from acquiring a covering lock on the range that would be inconsistent with the first transaction's access to the included (or overlapped) resource.
The difference between the IS and S locks becomes apparent when one considers a new transaction T.sub.4, whose purpose is to correct the income value for the taxpayer whose income is $41,290. This is a "singleton update," and FIG. 3 indicates that the involved range and key value must be locked in modes IX and X, respectively. FIG. 4 shows that no previous transaction has obtained a lock on the key value of $41,290. But that value falls within the $30,000 to $50,000 range, on which previous transactions have already acquired locks, and the mode, S, of two of those locks is incompatible with the IX-mode lock that transaction T.sub.4 requests on that range. Transaction T.sub.4 therefore cannot proceed until the S-lock-holding transactions T.sub.1 and T.sub.2 have ended, either by aborting or by running to completion, i.e., in database parlance, by "committing."
Now, let us suppose that transactions T.sub.1 and T.sub.2 commit. A committing transaction notifies the lock manager, which responds by removing the transaction's entries. FIG. 4 uses check marks to represent such removals for transactions T.sub.1 and T.sub.2. The only lock remaining on the $30,000 to $50,000 range after transactions T.sub.1 and T.sub.2 commit is therefore the IS-mode lock acquired by transaction T.sub.3. Remember that transaction T.sub.3 requested only an IS-mode lock on the range, since it did not need to read all of the range but only needed to indicate that it would be reading something within the range. Thus, as FIG. 3 indicates, the IS mode is compatible with the IX mode requested by transaction T.sub.4 on the $30,000 to $50,000 range. After transactions T.sub.1 and T.sub.2 have committed, therefore, transaction T.sub.4 can acquire the locks that it requires before it will proceed.
It can be appreciated that range locking of this type eliminates the phantom problem yet affords considerable concurrency. As the last example indicates, however, the use of such definer-specified ranges is somewhat inflexible. Specifically, the target income range mentioned above did not fit the predetermined ranges very closely, and locking enough predetermined ranges to cover the target range resulted in less concurrency than would have been possible if there had been a closer fit. Additionally, the approach to MGL just described requires considerable locking overhead, since it requires that two resources, namely, a key range and an individual key value, be separately locked for most operations.
A system that deals with these problems is the ARIES/KVL system described in Mohan, "ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes," Proc. Very Large Databases Conference, Brisbane, Australia (August 1990). This system dynamically re-defines key-value ranges in accordance with the current population of key values. Specifically, the system maintains a key-value-ordered index, and the possible lockable ranges are the ranges between each pair of successive key values that currently exist in the index. That is, if the existing key values are k.sub.1, k.sub.2, . . . , k.sub.i, . . . such that k.sub.i &lt;k.sub.i+1, then the ranges are the disjoint semi-open intervals (k.sub.i, k.sub.i+1 ], and each such range is identified by the upper bounding key value. Equivalently, of course, the range could be identified instead by the lower attribute value, in which case the range would be open at the top rather than at the bottom. Since locking granularity is finer if the ranges are disjoint, it is preferable for one end of the interval to be open. A single lock-request resource identifier in this system identifies both a key value and the range that the key value bounds. This enables a transaction to lock with a single lock request both a range and the key value that bounds it.
FIG. 6 lists the lock modes required in the ARIES/KVL system for the various operations that a transaction might include. Unlike FIG. 3, FIG. 6 includes only one column for the first four operations listed, because an ARIES/KVL operation does not separately lock key values and key ranges. Of particular interest in the present context, however, is that FIG. 6 does include a second column for insert and delete operations. This does not indicate that key values and ranges are locked separately for these operations. Instead, it represents a separate lock on what will be described below as the "next" key value/range. This is necessitated by the fact that inserting and deleting records in a KVL system divides or extends existing key ranges.
Suppose that a record having a key value of k.sub.i is to be deleted. Clearly, one transaction's deletion of a record that has been written, read, or inserted by a second uncommitted transaction potentially compromises the serializability of those transactions. Therefore, an operation that deletes k.sub.i should not be permitted if another transaction has any lock on k.sub.i regardless of the mode of that lock. A deleting transaction accordingly tests for any such lock by requesting an X-mode lock on k.sub.i. FIG. 6 includes the "instant" notation to indicate that the transaction need only request a compatibility indication and not actually acquire the lock, for a reason that will presently become apparent.
As FIG. 6 shows, the deleting transaction requests a lock not only on the targeted key value/range but also on the "next" key value/range, i.e., on key value/range k.sub.i+1, which includes the range previously represented by the deleted key value k.sub.i. Since that range has now been modified, no access to it should be permitted, so the transaction acquires an X-mode lock on range k.sub.i+1.
Insertions, too, require next-range locking. Let us suppose that an inserting transaction is to insert a new key value k.sub.i ' between k.sub.i and k.sub.i+1. This will reduce the width of the range previously represented by key value k.sub.i+1, so range k.sub.i+1 must be checked for locks. For this purpose, ARIES/KVL uses an instant lock in the IX mode if the inserting transaction has not previously locked that key/range. Clearly, if that range had been, say, scanned by another uncommitted transaction, as indicated by an S, X, or SIX lock, that range should not be modified by inserting a new record into it. Testing by means of an IX-mode lock prevents this. However, there is no reason why the k.sub.i ' record cannot be inserted by one transaction just because another uncommitted transaction has previously inserted the k.sub.i+1 record, as indicated by a previously existing IX-mode lock. Since the requested IX-mode lock is compatible with an IX-mode lock, such an insert "in front of" another insert can occur. (In front of and behind are defined by the direction in which key-valued ranges extend: a range extends in front of the key value that represents it.) The inserting transaction requests only an instant lock in this mode because there is no reason why one transaction's insertion of k.sub.i ' should prevent another transaction's access to k.sub.i+1.
This locking of the next range, although simple in principle, can be time-consuming in practice because of the need to identify the next key value, by which the next range is identified. To understand this requires consideration of the manner in which databases are indexed. The most popular type of index is the B-tree, which comprises a hierarchy of nodes, or sets of contiguously stored index (as opposed to data) records. The highest-level, or root node in a typical B-tree contains N-1 key values, N&gt;1, sometimes called separators in this context, that divide the entire key space into N subspaces, with each of which is associated a lower-level node to which the root node contains a pointer. That is, the separators in one level's nodes represent the key-value limits of nodes in the next level down. To find a database record containing a given key value, a search routine starts at the root node and determines, by comparing the searched-for key value with the root node's separators, the subspace into which the searched-for key value falls. The search routine then follows the pointer to the associated lower-level node, which typically itself contains a set of separators that further divide the subspace, in which case it also contains pointers to still-lower-level nodes. This process continues until the search reaches a node in the lowest level of the hierarchy. Such a node is referred to as a leaf node. If the search key is a secondary key, each leaf-node record typically consists of a value of that key and the record ID of a database record that contains that key value. In the case of a primary index, the leaf-node records are typically the database records themselves.
The database and its indexes are ordinarily stored on a disk or other non-volatile memory, whose contents the host computer's operating system accesses in integral numbers of pages. The size of memory block, or bucket, by which the DBMS accesses the database is typically a single page or a number of pages small enough not to tax the bandwidth of the disk-to-buffer channel unduly, and a leaf node comprises a single bucket in a conventional B-tree. If the entire index other than the leaf nodes could be cached in main memory, a random access of a database record could be performed with a single disk access. For most large-sized databases, however, this is not practical, and the lowest two tree levels must usually remain on the disk. Accordingly, one I/O operation must be performed to reach the penultimate-level node, and a second I/O operation must be performed to reach the leaf node. (Of course, if the search key is a secondary key, a still further I/O operation must be performed to reach the database record itself.)
The B-tree organization is the most popular indexing approach, but another approach, employing a "hashed" file, yields better performance in random accesses of single records. In accordance with this approach, some function of the search key value (the hash function) is used to compute the address of the bucket in which the record resides that contains the searched-for key value. Random access to a single record thus ordinarily requires only a single I/O operation. (In practice, buckets sometimes overflow, and a second I/O operation then must be performed to obtain records that reside in overflow buckets, but this typically happens in only a small percentage of accesses.)
Although hashing yields superior performance for random accesses of single records, hash-type indexes are somewhat less popular than B-tree indexes because their performance for range searching is abysmal. Since the B-tree separators assign unique nodal key-value ranges to the leaf nodes, leaf nodes in B-trees are ordered by search-key value, but the same is not true of the hash-file buckets. In a hash file, therefore, a range search will ordinarily require accessing all buckets, while only those whose key-value ranges overlap the target range need to be searched in a B-tree.
Not surprisingly, a number of attempts have been made to achieve the fast single-record accesses of hashing methods while also obtaining the range-search performance that simple B-tree organizations provide. Some have achieved this result at the cost of a heavy dependence of file-utilization efficiency on the key-value distribution in the database. One particular method, however, described in Lomet, "A Simple-Bounded Disorder File Organization with Good Performance," ACM Transactions on Database Systems, vol. 13, no. 4, December 1988, pp. 525-51, yields a single-record-access performance similar to that of straight hashing methods while affording a range-search capability that, at least for reasonably wide ranges, is similar to that of a conventional B-tree.
Like some other approaches, the bounded-disorder method employs an organization that is the same as that of a conventional B-tree at the higher levels, but the leaf nodes are much larger. The leaf nodes are large enough that all of the tree levels except the leaf level can conveniently be cached in main memory. Each leaf node is too large to make it practical routinely to read the entire node for single-record accesses, but it is essentially a hash file, divided into buckets, to which access is directed from the previous tree level by a hashing function.
The search of a bounded-disorder index accordingly proceeds in a manner essentially identical to that of a conventional B-tree search until it reaches the penultimate node level. At that point, the pointer identifies a leaf node, but the leaf node is too large to be read routinely into main memory for single-record accesses. Instead, the search routine employs a hash function of the searched-for key value to locate a particular bucket within the leaf node, and it is only this bucket that is read into main memory to obtain the searched-for record.
The result of this organization is that all of the leaf nodes are ordered in key value with respect to each other, but each leaf node consists of a plurality of buckets, within which records are distributed in accordance with the hashing function so that there is no order among the buckets within a leaf node. Preferably, however, the records within a single bucket are stored in key order.
Clearly, since the entire index except for its leaf-node level can be cached in main memory, a single-record file access can ordinarily be performed in a single I/O operation. (Again, a further I/O operation may be required for a bucket that has overflowed.) Since the leaf nodes are ordered, however, ranges of key values are localized to a subset of the leaf nodes, with the result that range-searching performance approaches that of a conventional B-tree, at least for ranges that span several nodes.
A bounded-disorder file organization thus affords significant advantages, at least for a single-user database system. But one may wish to apply the bounded-disorder principle to multi-user database systems, so the question arises of how to lock ranges where such an organization is employed. A little reflection reveals that a bounded-disorder index lends itself to the type of range locking in which the lockable ranges are in a sense independent of the key-value population: the lockable ranges can be defined by the key-value limits on the leaf nodes. Accordingly, when a single record is accessed, the range on which an intention lock needs to be placed is readily identified by the index terms encountered in traversing the tree, since the index terms encountered in the penultimate-level node are the key-value limits of the leaf node and thus of the lockable range.
But the granularity of that approach is too coarse to be acceptable for many applications, so one would prefer key-value locking. Unfortunately, conventional key-value locking--in which the lockable ranges are bounded by successive key values existing in the database--has not heretofore been readily implementable with a bounded-disorder index. The reason for this is that successive key values are not in general in the same bucket, so all of the buckets in a node would need to be accessed in order to identify the "next" key-valued range implicated in a delete or insert operation.
In a scan operation, it would be acceptable to access all of the buckets in a node, since nearly as many pages would have to be accessed for a similar operation in a conventional B-tree file if the scan operation's target range is very great, as it usually is. Indeed, the bounded-disorder index may provide performance advantages: because all buckets in a node ordinarily are stored contiguously, a scan in such an index would likely encounter fewer separate sets of contiguous pages, so the I/O operations could be completed more quickly. In an insert or delete operation, however, conventional key-value ranging results in significant disadvantages for a bounded-disorder organization. A conventional B-tree usually permits the implicated range to be identified without a further I/O operation, because the next key value is usually in the same leaf node as that in which the delete or insert operation is to be performed. But a bounded-disorder index requires that all of the buckets of a large node be inspected before the implicated range can be identified. (It should be observed, however, that even B-trees require a second I/O operation to identify the implicated key-valued range if the range bridges a page boundary.)