Key-value stores are a powerful tool to store and retrieve large amounts of data for activities such as data analysis. One difficulty in creating these key-value stores is the need for parallelism. The large amount of data that must be stored makes a key-value store on a single node impractical for most workloads. Thus, distributed key-value stores have been proposed for storing a partitioned key-value store (often referred to as a partitioned data store) on a number of parallel nodes.
Multidimensional Data Hashing Indexing Middleware (MDHIM) is an example of a framework for partitioned data stores. In a typical MDHIM implementation, one or more MDHIM clients run on each of the compute nodes and communicate with a plurality of MDHIM servers also running on the same or different compute nodes in a parallel file system. Each MDHIM server stores a partition of the key-value store. A given MDHIM server storing a particular sub-range of the key-value store is contacted to read or write key-values within the sub-range.
One challenge in a partitioned key-value store is the amount of key-value data that must be transferred, stored and processed. Thus, MDHIM employs low-latency Message Passing Interface (MPI) communications across the user-space of high performance computing (HPC) compute nodes to create a single virtual key-value store across a set of local key-value stores using ordered key-ranges.
While MDHIM has significantly improved the performance of partitioned data store in an HPC environment, a need remains for a partitioned data store that employs improved techniques for key look-ups by range-knowledgeable clients.