SQL statements can perform poorly because the query optimizer fails to select the best execution plan and selects instead a sub-optimal one. These poorly performing SQL statements can have a catastrophic impact on the performance of an application. Not only can they greatly impact the response time of the system, but they can also affect other performance characteristics of the database system by consuming too many resources such as CPU, I/O, temporary disk space, and memory. As a result, the entire application throughput can be impacted up to the point where the application is no longer functioning to an acceptable level. In fact, a single poorly performing SQL statement has the potential to choke a system and bring it down to its knees.
Usually the selection of a sub-optimal execution plan is based on inaccurate estimates, or cardinalities, of intermediate query results. Based on these estimated cardinalities, a cost model is applied by the optimizer to select an execution plan that is cost efficient for a statement. Unfortunately, because of the estimation errors, the optimizer ends up picking a sub-optimal execution plan.
Several factors can cause appreciable errors in the optimizer estimates and lead the optimizer to generate a sub-optimal plan. Some of these factors are: (1) Use of internal default selectivity estimates when statistics are missing (e.g., unanalyzed tables, temporary tables), or when predicates are complex. (2) Presence of data correlation in two or more columns of a table. (3) Skewed or sparse join relationship between two or more tables. (4) Existence of data correlation between columns of two or more tables. (5) Not accounting for the caching of index and data blocks in cost calculations. 6) Not accounting for concurrent execution of a query or several queries together in resource cost calculations.
When the optimizer fails to find the optimal plan, the plan can be manually tuned. To manually tune a sub-optimal execution plan, an application developer needs to determine a better execution plan for that statement among a virtually unlimited number of alternatives. Then the developer needs to force the optimizer to generate the better execution plan. This is accomplished today by using conventional optimizer hints, which are optimizer directives that are added by the developer to the text of the SQL statement.
Performing the manual SQL tuning process is very painful and difficult, because it has to be done by a developer who has a high level of expertise in query optimization. Second, it is a time consuming process because the text of the SQL statement itself has to be changed to include the optimizer directives. Furthermore, each statement is unique and needs to be dealt with individually, and moreover, the number of statements can be very large. Finally, the manual SQL tuning activity is a continuous task because the SQL workload changes, e.g., when new application modules are deployed. Also, changes in data and access structures (e.g., drop or create index) are very likely to cause changes in the execution plans, forcing the application developer to start over again.
A method of automatically tuning database query language statements allows the optimizer to identify one or more mistakes that are made when generating an execution plan, and allow the optimizer to correct the mistake itself. For example, one embodiment of the method to automatically tune an execution plan for a statement includes receiving, at the optimizer, a statement and performance information related to the statement. Then, the optimizer determines whether one or more performance statistics of the statement are available or unavailable in the performance information. The method then collects each unavailable statistic, so that these statistics are available to the optimizer during generation of the execution plan.