Software applications (“applications”) use various data structures for accessing data associated with an application. For example, applications use a data structure such as a hash table or a hash map for storing key-value pairs. The hash table (or the hash map) is a data structure that is used to implement an associative array, which is a data structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or entries, from which the correct value can be found. The hash table facilitates a “point lookup,” e.g., to retrieve a value for a requested key. However, the hash table does not facilitate retrieving the keys (and therefore, their corresponding values) in a specific order. The software application may have to include additional logic for arranging the keys in a particular order after the keys are retrieved from the hash table. Rearranging the keys can consume significant computing resources, especially as the number of key-value pairs stored in the hash table increases, or if the keys are requested in a particular order often, and can therefore, degrade the performance of the application. Accordingly, scaling a hash map to store the entire key-value pairs of the application can adversely affect the performance of the application.
Some applications use a tree data structure (“tree”), e.g., a “B-tree” instead of the hash map to provide sequential access to data. The B-tree inserts into and/or retrieves keys from the tree in a defined sequence. While the B-tree facilitates sequential access to the data, the cost (e.g., processing time, memory access, or other computing resource) involved in inserting a key into the tree increases significantly as the number of keys increase. Inserting a new key into the B-tree can cause the tree to rearrange the nodes of the tree by moving the keys to different nodes to write the new key into the tree. As the tree size increases, e.g., number of keys stored in the tree increases, retrieval of a key consumes more time because software applications may have to traverse a significant number of nodes, perform a number of comparisons to determine the right path to traverse, etc. Accordingly, scaling a B-tree to store the entire key-value pairs of the application can adversely affect the performance of the application.
A log structure merge tree (“LSM tree”) is a data structure that is employed in applications that require indexed access to data with high throughput. LSM trees typically have an in-memory portion and an on-disk portion. The in-memory portion typically does not perform efficiently as the amount of available memory increases. Moreover, use of LSM on newer data storage devices, e.g., solid state disks places additional demands on memory, e.g., to reduce storage “write amplification.”