The present invention is related to the field of computer systems. More particularly, the present invention is directed to a method and system of collecting execution statistics of query statements.
Query statements are often used to interrogate and access a database. These query statements are usually expressed using specialized query languages such as Structured Query Language (SQL). A query statement may include the identity of the database object(s) being accessed to execute a query statement (e.g., one or more named database tables). If the query statement accesses two or more database objects, then the query statement may also include the link between the objects (e.g., a join condition or a common column). In addition, the query statement may define a selection criteria, which is often referred to as a matching condition, filter, or predicate. The query statement may further define which fields in the database object are to be displayed or printed in the result.
To execute a query statement, a database system may have to perform operations involving the retrieval or manipulation of data from various database structures, such as tables and indexes. Often, there exists many alternate ways to execute the query statement. For example, a single query statement can be executed in different ways by varying the order in which tables are joined, the implementation of the join operation between two tables, and how data is retrieved from each table to execute the statement. The retrieval operation can be implemented by scanning all data in a table, or by using an index to access a fraction of the table. The join operation can be implemented using a hash-based or a sort-based algorithm. The implementation and order of operations taken to execute the query statement can drastically change the efficiency or speed of execution for the statement. The implementation and order of operations that are used to execute a query statement is referred to as an execution plan.
An optimizer may be used by the database system to choose what is believed to be the most efficient execution plan for the query statement. Selection of the execution plan may be based on costs, i.e., the amount of a given resource or set of resources needed to process the execution plan. Statistics may be used to estimate the costs associated with the execution plan by quantifying the data distribution and/or storage characteristics of data in database structures (e.g., tables, columns, indexes, partitions, etc.). The optimizer may also use statistics formulas to calculate the selectivity of predicates. Selectivity refers to the proportion or fraction of a database object corresponding to a query predicate. The selectivity of query predicates may be taken into account when estimating the cost of a particular access method or when determining the optimal join order.
Although query optimization has greatly improved in recent years, the true efficiency of an execution plan cannot be determined until it has actually been executed. Tools that can collect execution statistics at the query statement level are available to users for verifying whether the execution plan performed as expected. However, when an execution plan does not perform as predicted (e.g., the execution plan consumes more resources than anticipated), knowledge of execution statistics at the query statement level does not allow users to diagnose the source(s) of inefficiency (e.g., which part of the execution plan consumed the most resources) since the collected statistics are directed to the query statement as a whole. Hence, it would not be possible for users to pinpoint the bottleneck(s) in the execution plan.
Some tools are able to collect execution statistics at the operation level. However, the collection of execution statistics at the operation level in those tools is limited to a few operations such as scans and sorts. In addition, the collection mechanism is operation specific and requires modification of the operations themselves. Therefore, the collection mechanism is not easily implemented and cannot be used for other operations. Furthermore, the only generic execution statistic collected at the operation level in those tools is the number of rows produced by the operation. Such information would often be insufficient to determine the cause(s) of the bottleneck(s) in the execution plan (e.g., why a part of the execution plan consumed significantly more resources than other parts of the execution plan).
Moreover, the ability to diagnose the sources and causes of inefficiencies may allow users to fine tune databases to achieve optimal performance. For example, users may be able to tailor a database for a specific query statement. Most of the tools that collect execution statistics dump the collected execution statistics into trace files, which makes it difficult for users to exploit the information for performance tuning analysis as users cannot easily process information in the trace files in conjunction with other database information.
The present invention provides a method and system of collecting execution statistics of query statements. In one embodiment, an execution plan is generated for a query statement. The execution plan includes one or more operations. One of the one or more operations is selected and executed. A plurality of execution statistics of the selected operation is collected.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.