The problem of reliably storing and accessing large amounts of data has increasingly looked to distributed technologies, such as Distributed Hash Tables (“DHTs”) and Amazon.com's Dynamo™ key-value store to scale across commodity hardware while preserving the high-availability requirements of demanding service level agreements (“SLAs”).
In such systems, each data object is typically stored on multiple physical machines, also known as storage nodes, to provide redundancy against failure—and often also to improve performance. This model necessitates being able to determine which of the many storage nodes are responsible for a particular data object's storage. Commonly, this problem is solved by assigning each data object a key and then using a partitioning algorithm, such as consistent hashing, to choose a set of storage nodes.
The DHT model supports storing and retrieving data objects by key, which is sufficient for many problems, but not for spatial problems, such as finding all location tagged data objects within a specific geographic bounding region. More generally, DHTs do not naturally support range searches that restrict the search space by one or more of a data object's attribute dimensions.
Known systems are not configured to distribute data objects to nodes with regard to a data objects' dimensional attributes in a multidimensional space, such as a two dimensional geographic area. Using known systems to store a plurality of data objects with dimensional attributes would result in many data objects with dimensional attributes which are spatially close together in a multidimensional space being stored at different nodes. A range search of such multidimensional space based on dimensional attributes would be relatively slow and inefficient because many nodes would need to be queried in searching the respective data objects.
It would be desirable to provide a system and method to implement multidimensional range searches on top of a distributed data store. It would be further desirable that such system or method be configured to function effectively when dealing with non-uniformly distributed dimensional data, such as is typical of geographic location data. In order to support a distributed solution in which each storage node has its own data structure, memory, and/or disk, it would be desirable that data objects corresponding to dimensional attributes that are spatially close together according to the dimensions of a prospective range search be partitioned onto the same or relatively few storage nodes. For example, if a system is queried for location tagged data within a bounding area representing the city of San Francisco, it would be desirable that all data tagged within the city of San Francisco be stored in only a small number of storage nodes because all such nodes must be queried independently to compute a final result. Otherwise a large number of nodes must be queried and efficient computation cannot be guaranteed.