In computing, a B+ tree is a tree-based data structure that allows for the efficient searching and retrieval of key-value data (i.e., data organized in the form of key-value pairs [K, V]). For the purposes of the present disclosure, the content of a key K is referred to as the “key value” of K (or the “value of the key”). This should not be confused with value V in key-value pair [K, V], which is referred to herein as the “data entry” associated with key K.
Generally speaking, a B+ tree comprises two types of nodes: internal nodes and leaf nodes. Internal nodes appear between the root and bottom levels of a B+ tree and are considered navigational nodes because they guide tree traversal. In particular, each internal node stores up to b−1 keys and up to b pointers to lower level (i.e., child) nodes, where b is the branching factor of the tree. Each pointer is “between” two of the internal node's keys, if it references the root of a subtree where all keys in the subtree have a key value within the key subinterval defined by the two keys in the internal node. Stated more formally, for each internal node N having m keys, all keys in the subtree rooted by the first child of N have a key value less than N's first key, and all keys in the subtree rooted by the i-th child of N (where 2<=i<=m) have a key value greater than or equal to the (i−1)th key of N and less than the i-th key of N.
In contrast to internal nodes, leaf nodes appear only at the bottom level of a B+ tree and are considered storage nodes because they store the actual key-value data within the tree (up to b−1 keys/data entries per leaf node). To facilitate sequential access to these data entries, the leaf nodes can be configured to point to each other in the form of a linked list.
The root node of a B+ tree is a special case and can act as either an internal node or as a leaf node. The root node will be a leaf node in scenarios where the B+ tree does not contain sufficient data entries to overflow a single node. Moreover, B+ trees are “balanced” in the sense that all leaf nodes are the same distance from the root node, and each non-root node is guaranteed to be at least half full with pointers or data entries. These properties are enforced by the way in which nodes are split and merged when key-value pairs are inserted into, and deleted from, the B+ tree respectively.
One issue with conventional B+ tree implementations is that, at the time of creating/instantiating a tree data structure, they generally allocate a fixed amount of memory space for each key in the tree instance based on the largest possible “key space” (i.e., range of key values) supported by an associated application. By way of example, consider an application that uses B+ trees for tracking writes to virtual disk snapshots. In this example, the keys in each B+ tree correspond to addresses in a virtual disk, and the data entries in each B+ tree correspond to addresses of logical disks/volumes on physical media where data for a given virtual disk address has been written. If the maximum possible size of a virtual disk snapshot is 256 terabytes (TB), then a conventional B+ tree implementation may allocate, for each node of a B+ tree created via this application, a fixed amount of 64 bits per key (since 64 bits is sufficient to address a 256 TB key space).
The problem with this approach is that, in many cases, the actual size of a given virtual disk snapshot will fall far below the theoretical maximum of 256 TB. For instance, assume that a virtual disk snapshot is created that is 128 gigabytes (GB) in size (which is likely to be closer to the average than 256 TB). In this case, each key in the B+ tree created for this virtual disk snapshot will still be allocated 64 bits, even though a considerably fewer number of bits are required to represent the key space of 128 GB. This, in turn, will result in wasted memory space on-disk and/or in RAM or cache. The degree of this wasted memory overhead can become significant if a large number of trees are created and maintained concurrently, or if higher and higher theoretical limits on key space (e.g., on the order of petabytes, exabytes, etc.) are implemented by the application over time.