A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables, also referred to as relations, made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, with the optimal query plan being the one with the lowest cost, e.g., response time, CPU processing, I/O processing, network processing, as determined by an optimizer. The response time is the amount of time it takes to complete the execution of a query on a given system. In this context, a “workload” is a set of requests, which may include queries or utilities, such as loads, that have some common characteristics, such as application, source of request, type of query, priority, response time goals, etc.
Contemporary research involves improving the “robustness” of a query optimizer, where robustness is characterized by query execution plans whose performance are predictable and consistent. From the standpoint of system management, the predictability of query plan performance is very important. When the actual cost of a query plan deviates significantly from the optimizer's estimate, it becomes more difficult to tune and manage the performance of a large system.
Although query robustness is a desirable goal, there is often a tradeoff between query robustness and the goal of achieving the best possible query performance. Optimizers frequently have a choice between a conservative query execution plan that will perform reasonably well in most situations, and an aggressive plan that performs very well if the cost estimates and runtime assumptions are accurate. However, if the cost estimates and runtime assumptions are not accurate, the performance of processing an aggressive plan may be significantly degraded. The decision of whether to pursue an aggressive strategy is strongly influenced by a “confidence” parameter that the optimizer assigns to its estimates which, in turn, are based on various factors, such as the presence of statistics. Such confidence parameters are typically included in the plan information made available to system management tools which may alter rule enforcement logic in view of the confidence parameters. For example, an optimizer's upper bound costs may be used in place of lower bound costs when the associated confidence level is below a particular level.
Although it is useful to factor in the optimizer's assigned confidence levels when regulating workload performance, contemporary approaches are inherently passive and reactive with regards to controlling the desired level of optimizer aggressiveness. As noted above, there is often a tradeoff between predictability and performance, and setting the appropriate balance between the two competing goals must take into account the specific requirements of the individual application or workload. For example, certain workloads or operating environments comprising complex ad hoc queries may often have aggressive plans applied thereto even if they occasionally perform much worse than anticipated. Other environments, such as those comprising fairly short running transactions, may require system performance that is known and consistently obtainable even if the performance level is not optimal. However, no mechanisms are currently available that provide for proactively controlling an optimizer's pursuit of robustness for individual workloads or queries.