Storing and retrieving data is a critical operation for many software applications. For example, software applications such as web services and scientific computations often need to store data and retrieve it later. Such data is often structured to conform to a rigid schema such as the names and types of attributes common to the data. Data may also be semi-structured in that the data does not conform to a rigid schema but nonetheless contains tags or other markers to separate attribute values. Furthermore, data may be unstructured wherein the data lacks attributes entirely.
The advent of distributed computing environments such as cloud computing systems has opened new possibilities for the rapid and scalable deployment of data storage and retrieval systems. In general, a distributed computing environment deploys a set of hosted resource servers that can be combined or strung together to perform coordinated tasks. For example, one group of resource servers can be configured to accept and service requests from web clients, known as front-end servers. A further group of resource servers can be configured to serve as a data store to provide data storage and retrieval services to the front-end servers. Other types of resource servers are also possible.
A user or customer can request the instantiation of a virtual machine or set of machines from those resources from a central server or management system to perform intended tasks or applications. The user can lease or subscribe to the set of instantiated virtual machines for their intended application. For example, a user may wish to set up and instantiate a virtual server from the distributed computing environment to create a storefront for products or services on a temporary basis.
In addition to distributed architectures, distributed applications may be deployed natively across one or more datacenters. Instead of using the hosted resource servers provided by an operator of a distributed architecture, a user may choose to deploy their software natively on dedicated hardware.
Regardless of whether a distributed architecture is used to deploy a distributed application or whether the distributed application is deployed natively across one or more datacenters, many current applications require quick storage, indexing and retrieval of structured and semi-structured data. These services are typically provided by one or more servers known as the backing store.
In the past, traditional relational databases have been used predominantly as the backing store for data intensive applications. Relational databases typically support very general mechanisms for querying the data store. The term “query” refers to the process of retrieving all objects whose attribute values match a specified set of values. While relational databases enable users to retrieve objects by querying for any of their attributes, this generality comes at the expense of higher overheads. Relational databases entail large overheads and have difficulty scaling up.
Key-value stores provide an alternative to relational databases for the storage and retrieval of data. A key-value store—also known as an associative array, object store—comprises a set of keys and a set of values where each key corresponds to one or more values. The term “lookup” refers to the process of finding the one or more values associated with a key. Key-value stores provide a very efficient lookup operation, but such efficiency typically comes at the cost of reducing the interface to lookup operations. Specifically, whereas traditional databases enable querying objects by any attribute value, key-value stores typically enable clients to lookup the data solely by the single key under which it was inserted into the database. This restriction to a single key helps improve performance and scalability significantly, but fails to support applications that need to recall objects by attribute values other than the primary key. Furthermore, queries based on non-primary attributes are typically forced to enumerate all objects of a given type. Current key-value stores do not support an efficient search function.
Thus, there is a need for a distributed key-value store that supports mapping data objects to support an efficient search function, specifically a search on any combination of attributes (primary and non-primary) of the data object. The present invention satisfies this demand.