Database are useful for storing and categorizing data in such a manner that the stored data can be accessed, analyzed and/or otherwise utilized through one or more queries performed on the databases by a database application.
A database application allows a user to interact with a database server by submitting server commands to the database that cause the database server to perform an operation or a set of operations on data within the databases and/or on the databases, themselves. Database commands, which include database queries, are input in a text-based form at the database application using any database language known in the art and which the database server supports. One exemplary database language supported by many database servers is the Structured Query Language (SQL). Additional non-limiting examples of database languages include Datalog, Data Mining Extensions (DMX), Language Integrated Query (LING), and XML Query (XQuery).
Like many database languages, SQL is a declarative language. Commands written in a declarative language describe a desired result or what the program must accomplish (i.e., the logic of a computation) and allow the computing system to determine how to accomplish the set of operations necessary to arrive at the commanded outcome. This is in contrast to imperative programming languages, which provide the computing system with a defined set of ordered instructions to process—the how for achieving a desired result.
While commands written in an imperative language provide precise computing instructions, they lack the flexibility and conciseness often associated with commands written in declarative languages. Further, declarative languages are particularly useful when it comes to accessing information from a database. Typically, a user knows what information they are searching for in a database or what the desired outcome is, but they may not know a precise order of operations necessary to achieve the final result or even the most efficient order of operations for doing so. By writing a database command in a declarative language, the user need only provide the desired result and allow the computing system to determine the required steps and order of operations necessary to return the appropriate result.
As a natural consequence of processing commands written in a declarative language, the database server must determine which actions to perform to accomplish the desired result and then perform those actions. The act of preparing for performance of those actions is generally referred to as “compiling” the database command, while performing the actions is generally referred to as “executing” the database command.
As part of compiling the database command, a database server creates or identifies a “query plan” for executing the database command. A query plan is a data structure that represents a set of operations or a series of steps or actions which, when executed by the database server, accomplish the desired result indicated by the database command. There is often not a single query plan associated with a given database command, but rather, there are often a variety of alternative query plans identified that, when executed, produce the same, correct result for the given database command. In some instances, a program called a “query planner” may be used at the compiling step to determine available query plans.
Although each query plan may arrive at a same, correct result, not all query plans are created equal. Some query plans may be more desirable than other query plans. There is often a wide range of performance related factors that affect the desirability of executing each query plan. Performance factors include, for example, database configurations, hardware configurations, compilation and execution environment parameters, database user profiles, and/or the number of concurrent database users. Changes in one or more of the foregoing factors typically cause a change in execution efficiency of query plans, and methods for altering and/or optimizing performance related factors are known in the art.
In addition to performance-related factors affecting the desirability of executing one query plan over another, computational costs associated with each query plan may also be considered. Computational costs include such things as the processing time and efficiency of executing a query plan, and it is advantageous for a database server to execute a database command using a query plan that performs relatively quickly or efficiently compared to other alternative query plans. Such a query plan may be referred to as an optimal query plan or a least computationally expensive query plan.
A database server typically relies on a query planner to predict the computational cost of each query plan and further relies on a module known as a “query optimizer” to identify the optimal query plan. As discussed below, the query optimizer may, in some embodiments, be the same module as the query planner. Regardless, the query planner typically predicts the costs of each query plan, and the query optimizer receives and/or analyzes the computational cost for each query plan and selects an optimal query plan to be executed based on the computational costs.
For a given query plan, the predicted cost analysis includes, among other elements, estimating the cost of performing each step in the sequence of steps specified in the given query plans. The cost estimates for each step are often based on database statistics, which can be time-specific snapshots of one or more database components and which can introduce inherent uncertainties in the query optimizer's cost analysis.
Basing computational costs on such factors may result in errors in cost estimations. This is particularly problematic if query planners rely on database statistics that have become stale or out-of-date when predicting computational costs. For example, tables or indexes subject to substantial data change operations (e.g., insert, update, delete, or merge) often undergo changes in their data distribution and subsequently render any associated statistic stale or out-of-date, as the respective statistic no longer reflects the current data distribution of the given data component. In such instances, the predicted performance cost of a query plan incorporating out-of-date statistics is likely to be significantly different than its actual performance cost when executed, and because the query optimizer relies on the predicted computational costs and not the actual computational costs when selecting an optimal query plan, the query optimizer is likely to err in identifying a query plan that has optimal performance for a database command.
Because stale or out-of-date statistics can cause the inefficient execution of database commands, traditional solutions revolve around updating the statistics. However, updating statistics is an expensive process that requires an allocation of both time and computational resources. Additionally, when statistics are being updated, any associated data components cannot be queried. This adds additional time to processing requests and generally decreases the efficiency of the database server, its operations, and processing of database commands. Thus, existing systems have a difficult time balancing the frequency in which statistics are updated and the losses in efficiency associated with executing non-optimal query plans that are chosen based on cost calculations that incorporate stale statistics.
Accordingly, there is an ongoing need for improved systems for performing queries and for executing query plans on databases.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.