The present invention lies in the field of data storage and data mining. In particular, the invention relates to the handling of range queries submitted to a database of information encoded as a set of data items.
Distributed data stores, for example, graph databases, have been intensively investigated in the past few years. Leading technical solutions include a variety of data partition approaches and/or the data caching solutions that are tuned against particular representations. Both solutions are not ideal. Data partitioning and/or clustering is intrinsically complex and can be NP-complete in general. When using graph representation for an ontology, it is difficult to maintain data balance and minimum data replication across multiple storage units while at the same time ensuring that no knowledge is lost during the distribution process. Therefore, inter sub-graph reference happens frequently leading to a majority of the graph partition and clustering approaches failing to significantly reduce inter data-node communication. Similarly, the widely used caching algorithms are not designed for graph representation. Such problems are not restricted to graph data, and can also arise in the case of other types of distributed data storage.
There are several approaches of how to store and retrieve data from a distributed data store such as a Key Value store. One approach is to use an ordered key/value store that enables executing range queries over the keys. This approach improves the performance of Scan operations, since the system does not need to read and filter data from the entire set of servers. Another approach is to store data in a regular (i.e. not ordered) key/value store and, since the order of the keys does not need to be maintained, read operations can be optimized by co-locating data that is read together. This approach optimizes traversal operations on distributed data stores. However, due to the difficulty of maintaining an ordered data set while moving data around (co-locating) to optimize future read operations, it does not generally observe the order of keys and is expensive for scan operations.