Data systems build index structures to provide efficient accessing of data pages in a data file. Examples of popular index structures include a B-tree form of tree data structure and a skip list. The B-tree is a generalization of a binary search tree in that a node can have more than two children. Unlike self-balancing binary search trees, the B-tree is advantageous in systems that read and write large blocks of data. It is commonly used in databases and filesystems.
B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. In such cases, the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the nodes are in secondary data storage such as disk drives. Increasing the number of child nodes within each internal node decreases the height of the tree, and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often.
In situations where a query workload is unpredictable or if just a selected portion is accessed, such as “hot data,” however, using a B-tree for all of the data can be inefficient especially when there are a large number of random reads and writes. Examples of this can be seen in newer disk technologies such as a solid-state disk (SSD) where update mechanisms are often more complex and more expensive than other disk technologies. Attempts to improve the performance of B-trees in newer disk technologies have included partitioned B-trees and database cracking for building index structures that are adaptive. These approaches, however, postpone the cost of index construction to subsequent queries in the system.