SQL tuning is a very critical aspect of database performance tuning. It is an inherently complex activity requiring a high level of expertise in several domains: query optimization, to improve the execution plan selected by the query optimizer; access design, to identify missing access structures; and SQL design, to restructure and simplify the text of a badly written SQL statement. Furthermore, SQL tuning is a time consuming task due to the large volume and evolving nature of the SQL workload and its underlying data.
Typically the database administrator (DBA) or an application developer performs the tuning process. However, it is often a very challenging task. First, it requires a high level of expertise in several complex areas: query optimization, access design, and SQL design. Second, it is a time consuming process because each statement is unique and needs to be tuned individually. Third, it requires an intimate knowledge of the database (i.e., view definitions, indexes, table sizes, etc.) as well as the application (e.g. process flow, system load). Finally, the SQL tuning activity is a continuous task because the SQL workload and the database are always changing.
For example, a compiler relies on data and system statistics to function properly. It uses the number of blocks and number of rows in order to cost the full scan of a table, when selecting the best access path to retrieve a table's data. These statistics may be missing or stale. In addition to base statistics, the compiler also can use statistics on intermediate results. For example, the compiler can estimate the number of rows from applying table filters when deciding which join algorithm to pick. These statistics are derived from base statistics using various methods, e.g., probabilistic models. These statistics may also be missing or stale.
Examples of factors that lead the optimizer to generate a sub-optimal plan include missing or stale base statistics, wrong estimates of intermediate result sizes, and incorrect settings for environmental parameters. Missing statistics cause the optimizer to apply guesses. For example, the optimizer assumes uniform data distribution even when the column used in a predicate is skewed, if there is no histogram. Wrong estimate of intermediate result sizes. For example, the predicate (filter or join) is too complex to use standard statistical methods to derive the number of rows (e.g., the columns are compared thru a complex expression like (a*b)/c=10). Inadequate settings for the environment parameters used during the optimization process. For example, the user may set a parameter to tell the query optimizer that it intends to fetch the complete result set produced by the query while it actually fetches only a few rows. In this case, the query optimizer will favour plans that return the complete result fast, while a better plan would be the one that returns the first few rows (e.g., 10) fast.
However, when a user performs a manual tuning process, tuning information can be collected from several different sources. The user often does not know how to integrate these different types of information. Also, the user typically does not know which factors to use to correct the plan. To help the DBA and the application developer overcome these challenges, several software companies have developed diagnostics tools that help identify SQL performance issues and suggest actions to fix them. However, these tools are not integrated with the query optimizer, the system component that is most responsible for SQL performance. Indeed, these tools interpret the optimization information outside of the database to perform the tuning, so their tuning results are less robust and limited in scope. Moreover, they cannot directly tackle the internal challenges faced by the query optimizer in producing an optimal execution plan. Finally, the recommended actions often require modification of the SQL text in the application source code, making the recommendations hard to implement by the DBA.
For example, the LEO (LEarning Optimizer) research project at IBM attempts to correct errors in the cardinality estimates made by the query optimizer. The corrections are done based on actual cardinality values gathered during query execution. The corrections are computed as adjustments to the optimizer's estimates and stored in dictionary tables. When a SQL statement is submitted to the query optimizer, the query optimizer will first check whether any adjustments are available as a result of a previous execution of a related query and if they do then it will apply them. However, LEO does not compensate for stale or missing statistics on base objects (e.g., number of distinct value of a column). LEO also does not automatically choose the appropriate optimization mode.
Also, a number of commercial products assist a DBA in some aspects of tuning-inefficient SQL statements. None, however, provide a complete tuning solution, partly because they are not integrated with the database server. For example, Quest Software's SQLab Vision provides a mechanism for identifying high load SQL based on several measures of resource utilization. It also can rewrite SQL statements into semantically equivalent, but potentially more efficient, alternative forms and suggests creation of indexes to offer more efficient access path to the data. Since the product resides outside of the RDBMS, the actual benefit of these recommendations to a SQL statement is unknown until they are actually implemented and executed.
LeccoTech's SQLExpert is a toolkit that scans new applications for problematic SQL statements as well as high load SQL statements in the system. It generates alternative execution plans for a SQL statement by rewriting it into all possible semantically equivalent forms. There are three problems with this approach. First, it cannot identify all the ways of rewriting a SQL statement (which is normally the domain of a query optimizer). Second, equivalent forms of a SQL statement do not guarantee that the query optimizer will find an efficient execution plan if the bad plan is a result of errors in the optimizer's internal estimates like cardinality of intermediate results. Third, all the alternative plans will have to be executed to actually determine which, if any, is superior to the default execution plan.
Microsoft SQL Server offers an Index Wizard to provide recommendations to the DBA on the indexes that can potentially improve the query execution plans.