Indexes are data structures that provide relatively fast access to a set of data based on a key values. FIG. 1 illustrates an exemplary table 100 with a corresponding B-Tree index 102. The table 100 has a name column 104 and a gender column 106. The value in the name column 104 is used as the key of the B-Tree index 102. The B-Tree index 102 includes branch nodes and leaf nodes.
Branch nodes contain pointers to other nodes and data that indicates the range of values associated with the nodes to which they point. For example, node 108 contains pointers 110 and 112. Node 108 also stores the letter "M" to indicate that names that begin with the letters "A" through "L" are stored in the nodes attached to pointer 110 while the names that begin with the letters "M" through "Z" are stored in the nodes attached to pointer 112.
The leaf nodes of B-Tree 102 store key values and pointers to the rows of table 100 that correspond to the key values. For example, leaf node 114 contains three entries. The first entry stores the key value "KARL" and a pointer to the row in table 100 that contains the value "KARL" in the name column. The second entry of leaf node 114 stores the key value "KRIS" and a pointer to the row in table 100 that has the key value "KRIS". The third entry of leaf node 114 stores the key value "LANE" and a pointer to the row in table 100 that contains the key value "LANE".
As new data items are inserted into the base data, new entries that correspond to the new data items are added to the index. For example, if a record where the data for column 1 is "ANGIE" and the data for column 2 is "F" were added to table 100, a corresponding index entry would be added to leaf node 116 of B-Tree 102. The new index entry would include the key value "ANGIE" and a pointer to the new row added to table 100.
FIG. 2 illustrates a system that includes two nodes 204 and 214 and a disk 200. Nodes 204 and 214 generally represent processing units that have access to the one or more disks that contain the database in which table 100 is stored. Nodes 204 and 214 may be, for example, networked workstations or clusters of processors and memory components within a multi-processing machine.
Before an entry may be added to an index, the portion of the index into which the entry is to be added must be loaded into the dynamic memory of the node that is inserting the entry. For example, assume that a transaction 210 executing in node 204 specifies the insertion of a row containing the data "ANGIE, F" into table 100. Assume also that disk block 202 stores leaf node 116 of a B-Tree index 102. To insert the appropriate index entry into index 102, disk block 202 is loaded into buffer cache 206 of node 204. In illustration, the loaded copy of the block is shown as 202'. The copy 202' of disk block 202 that is stored in buffer cache 206 is updated with the appropriate index entry for "ANGIE". At a later time, the updated copy 202' of disk block 202 is stored back to disk 200.
Typically, the closer key values are to each other in the order used by the index, the more likely the index entries for the key values will be stored in the same portion of the index. For example, index entries for "KEN", "KENT" and "KENNETH" would all be stored in leaf node 114. Consequently, there is a high likelihood that index entries for data items with consecutive key values will be stored in the same portion of an index structure.
Under many conditions, data is entered into a database in such a way that consecutive entries have consecutive key values. For example, records may be keyed into a database system in alphabetic or numeric order. Even records that do not initially have an order with respect to each other may be assigned key values based on the order in which they arrive. For example, one way to assign a unique identifier to each piece of e-mail in an e-mail system is to assign each e-mail a strictly increasing number based on the order in which the e-mail is received.
When consecutively inserted data items have consecutive key values, the new index entries for the new data items are inserted into the same portion of the associated index. For the purposes of explanation, the portion of an index into which new entries will be inserted is referred to as the "target portion" of the index. For example, while rows that contain names that begin with the letters "A" through "C" are being added to table 100, leaf node 116 will be the target portion of index 102. During the insertion process, the rate at which the target portion is accessed will be relatively high, while the rate at which other portions of the index is accessed will be relatively low.
When only one node (e.g. node 204) is being used to insert data into table 100, the fact that one portion of index 102 is heavily accessed will typically not have adverse effects on the efficiency of the insertion process. For example, while rows with names beginning with the letters "A" through "C" are being added to table 100, block 202 will remain loaded in buffer cache 206. However, when two or more nodes are used to insert data into table 100, the fact that one portion of index 102 is heavily accessed by both nodes may lead to significant problems.
Specifically, each node must update the most recent version of block 202 to insert an index entry into leaf node 116. Therefore, if the version 202' of block 202 that is located in buffer cache 206 has been updated by node 204, the updated version 202' of block 202 must be written to disk 200 and loaded into buffer cache 216 before node 214 may insert an entry into leaf node 116.
The updated version 202' of block 202 that resides in buffer cache 216 would then have to be written to disk and loaded into buffer cache 206 before node 204 could insert a subsequent entry into leaf node 116. The transfer of data from the buffer cache of one node to the buffer cache of another node is referred to as a "ping". Pings involve a significant amount of overhead, including multiple I/O operations and lock-related communications.
Based on the foregoing, it is clearly desirable to provide a technique for reducing the number of pings that occur when more than one node is used to insert ordered data into an indexed body of data. It is further desirable to reduce the rate at which any particular portion of an index is accessed when more than one node is used to insert data.