Prior art query optimizers typically rely upon a cost model to choose the best query execution plan for any given query. While prior art query optimizers do a remarkably good job of estimating both the cost and the cardinality of most queries, many assumptions underlie this mathematical model. For example, cost estimates are heavily dependent upon the optimizer's estimates for the number of rows (cardinalities), which will result at each step of the query plan. For complex queries involving many predicates and/or operations, these estimates rely upon statistics on the database and modeling assumptions that may or may not be true for a given database. In other words, estimates rely on accurate statistics of database characteristics, beginning with the number of rows for each table or index.
In most optimizers, alternative query execution plans are generated for a given query and the optimizer selects the optimal query plan, which may be the most cost-effective execution plan or one of the more cost-effective execution plans. The optimizer identifies an execution plan (query plan, join plan, or strategy) that reduces the estimated response time of a given query. The response time is the amount of time it takes to complete the execution of the query on the given target system.
One technique of query optimization use is a cost model to estimate the response time of a given query plan and to search the space of query plans to return a plan with a low cost. In the cost-based optimization model, different methods for doing a unit of work is compared and the most efficient method is selected (the plan with the lowest cost). Because the number of alternatives can be quite large, the query optimizer module uses statistics and/or sampling techniques to reduce the search space in optimizing queries.
However, prior art optimizers fail to consider the impact of run-time statistics (real-time statistics) may have on query optimization. For example, many commercial database products, including those offered by Teradata Corporation, the assignee of the present invention, offer automated system management features that control resource usage through the use of rules. Such rules are typically defined for a workload consisting of SQL requests that have been identified based on user defined criteria. The rule definitions are often defined in the context of cost information used by the query optimizer. Examples of such rules include a maximum row count or a maximum processing time for an execution step. Users may also define the action to take when a given rule is violated. Possible actions include reporting an exception, moving the query to another workload, or aborting the query.
Resource allocation rules are normally enforced prior to query execution, although in some systems they can also be enforced at run-time after the execution of a given step. Enforcing rules prior to execution has the advantage of proactively preventing inefficient queries from consuming excessive system resources. However, the major drawback to this approach is that it relies on optimizer estimates for resource usage which, in some cases, may be inaccurate. For example, resources that have been underestimated can result in the execution of an inefficient query that negatively impacts system performance. Conversely, resources that have been overestimated may cause the rejection of an efficient query whose execution would not have negatively impacted the system. Actual field experience has shown that this problem occurs in “real world” applications causing system performance problems and the improper rejection of legitimate tasks.
Delaying the enforcement of rules until query execution time has the added benefit of the system relying on actual resource usage statistics rather than mere estimates. A query execution subsystem may record the actual amount of resources consumed by a completed step (e.g., processing time, spool space, etc.) and make this information available to other subsystems. Because rules typically represent a negative event to avoid, the major drawback of this approach is that it is more reactive than proactive, in that a negative event has already occurred prior to rule enforcement. For example, if a rule is defined to avoid queries whose spool consumption is greater than 1 gigabyte, the rule is not enforced until after 1 gigabyte (or more) has already been allocated from the system's available pool.
Therefore, there is a need in the art for query optimizations that occur during runtime using a rule enforcement mechanism that is both accurate and timely. The present invention satisfies this need.