Within the field of computing, many scenarios involve a query to be applied to a data set stored by one or more data stores. For example, a user or a data-driven process may request a particular subset of data by requesting from the data store a query specified in a query language, such as the Structured Query Language (SQL). The data store may receive the query, process it using a query processing engine (e.g., a software pipeline comprising components that perform various parsing operations on the query, such as associating names in the query with the named objects of the database and identifying the operations specified by various operators), apply the operations specified by the parsed query to the stored data, and return the query result that has been specified by the query. The query result may comprise a set of records specified by the query, a set of attributes of such records, or a result calculated from the data (e.g., a count of records matching certain query criteria). The result may also comprise a report of an action taken with respect to the stored data, such as a creation or modification of a table or an insertion, update, or deletion of records in a table.
In many such scenarios, the database may be distributed over several, and potentially a large number of, data stores. For example, in a distributed database, different portions of the stored data may be stored in one or more data stores in a server farm. When a query is received to be applied to the data set, a machine receiving the query may identify which data stores are likely to contain the data targeted by the query, and may send the query to one or more of those data stores. Each such data store may apply the query to the data stored therein, and may send back a query result. If the query was applied by two or more data stores, the query results may be combined to generate an aggregated query result. In some scenarios, one machine may coordinate the process of distributing the query to the involved data stores and aggregating the query results. Techniques such as the MapReduce framework have been devised to achieve such distribution and aggregation in an efficient manner.
The data engines utilized by such data stores may be quite sophisticated, and may be capable of applying many complicated computational processes to such data stores, such as database transactions, journaling, the execution of stored procedures, and the acceptance and execution of agents. The query language itself may promote the complexity of queries to be handled by the data store, including nesting, computationally intensive similarity comparisons of strings and other data types, and modifications to the structure of the database. Additionally, the logical processes applied by the query processing engine of a data store may be able to answer complicated queries in an efficient manner, and may even improve the query by using techniques such as query optimization. As a result of these and other processes, the evaluation of a query by a data store may consume a large amount of computational resources.