Query optimization involves the translation of a database query into an efficient program or query plan to be executed on data stored in a database. The database query is typically stated in a query language, such as Structured Query Language (SQL), Common Query Language (CQL), or Multidimensional Expressions (MDX), among others, which is converted into one or more possible query plans. A query plan specifies a set of steps that are used to access or modify the data associated with the query. Details, such as how to access a given data relation, in which order to join data relations, sort orders, and so on, may form part of a query plan.
For a given query, a large number of query plans may be generated by varying different constituents of the query plan, such as access paths, join methods, join predicates, and sort orders. A typical data warehouse query may produce several hundreds of millions of possible execution plans. The cost of a query plan can be modeled in terms of various parameters, including, for example, the number of disk accesses and the response time required to fetch data. A query optimizer may evaluate the costs of all possible query plans for a given query and determine the optimal, i.e., most efficient plan for executing the query.
Generally, a query optimizer cannot be accessed directly by a user. Thus, a user has little visibility into the optimization process. Nevertheless, a user can identify sub-optimal performance in any number of forms, including an optimizer process crash, an optimizer producing a query plan that produces erroneous results, an inefficient query plan, or a sub-par plan compared to system operation before a system modification. A lack of access to a query optimizer limits the ability of a user to perform what-if analyses where plans are generated under varying circumstances.
Assessing the performance of a query optimizer involves an understanding of multiple parameters. Typically, a query optimizer user is not willing to provide a query optimizer vendor visibility into the user's system, particularly the user's data. Even if access to data is possible, capturing the right data is difficult. In addition, a user is typically not willing to allow any type of evaluation that interferes with ongoing business processes. Thus, while there is a need to understand query optimizer performance in a deployed system, there are limited opportunities to do so. Accordingly, it would be desirable to provide techniques to assess the performance of a deployed query optimizer.