Conventional storage structures, such as the well known B-tree, are used to quickly locate specific units of data (such as data records in a database or files in a directory) stored on a secondary storage device. A B-tree provides a means of clustering pointers to units of data, so that the units can be quickly located.
FIG. 1 is a block diagram of a conventional B-tree 100 storing a database index. The B-tree 100 has a root node 101, internal nodes 102, 103, 104, and leaf nodes 105, 106, 107, 108, 109, 110, 111, 112, 113. The root node 101 is the node at the top of the tree 100, the leaf nodes 105, 106, 107, 108, 109, 110, 111, 112 and 113 are those at the bottom of the tree, and the internal nodes 102, 103 and 104 are all of the nodes in between the root node and the leaf nodes.
The root node 101 contains one or more key values and two or more pointers to internal nodes 102, 103 and 104. Each internal node 102, 103 and 104 contains one or more key values and two or more pointers to lower level internal nodes or leaf nodes 105, 106, 107, 108, 109, 110, 111, 112 and 113. Each leaf node 105, 106, 107, 108, 109, 110, 111, 112 and 113 contains key values and pointers to units of data indexed by the key values. For example, the leaf node 107 contains a key value "40" and a pointer 118 to a data record 119 that corresponds to the key value "40."
Nodes are considered parent nodes to nodes to which they point in the next lower level, which are considered child nodes. Leaf nodes have no child nodes. For example, the internal node 102 is a child node to the root node 101, and a parent node to the leaf nodes 105, 106, 107.
In order to clarify how searching is performed in a B-tree, it is helpful to consider an example. Suppose the B-tree of FIG. 1 is an index to a database file and the key values in each node correspond to a key value field in a data record of the database file. To locate a data record having a key value of "40", a searching routine follows pointer 114 from the root node 101 to the internal node 102. Next, the searching routine follows pointer 117 from the internal node 102 to leaf node 107. The searching routine then searches through key values in leaf node 107 until the key value "40" is found. Finally, the searching routine follows pointer 118 from leaf node 107 to data record 119.
For a comprehensive discussion of B-trees and B-tree maintenance algorithms, see Cormen, Introduction to Algorithms (The MIT Press 1991), pp. 381-399.
Because a B-tree is stored in secondary storage and data is typically transferred from secondary storage to main memory one page at a time, a common optimization is to have each node in the B-tree occupy an entire page. This way, only one secondary storage access is required to read all of the key values in a node.
When a node is allowed to occupy an entire page, key values and pointers are added to the node until the node and page are full, i.e., there is no space available in the page. To add a key value to a node that is full, the node is divided into two nodes (each node containing one half of the key values and pointers), an additional page is allocated to the B-tree, and one of the two nodes is stored on the new page. The other node is stored on the additional page. A key value and a pointer to the other node are added to the parent node of the node that was split. When the parent node becomes full, the parent node is also split using the same technique. Splitting can propagate all the way to the root node, creating a new level in the B-tree whenever the root is split. Attempting to add one key value to a node that is full will cause a page split, leaving the B-tree with two pages that are half empty.
Normal activity in a B-tree includes the addition and the deletion of key values. As explained briefly above, key value additions can cause additional pages to be added to the B-tree. The additional pages do not always contain the maximum number of key values possible, thus wasting valuable storage space. Additionally, key value deletions can leave pages with a less than optimal number of key values stored on them.