A database typically includes one or more database tables for storing data values. To retrieve and manipulate the data in the database, a user, application program or a remote system typically issues a database query to a Database Management System (DBMS). A database query refers to a set of statements, commands or clauses that are used to retrieving data stored in the databases. For instance, a query may specify a particular location (e.g., a table, row or column) in a database to retrieve data from, or how to join columns from multiple tables.
The query is compiled before execution. During compilation, the DBMS typically parses the query into relational operators that specify how to implement the query. The relational operators may be ordered in a “query plan” or “access plan.” Since there is typically a large number of alternative ways to execute a given query, with widely varying performance, the DBMS' query optimizer may evaluate some of the different possible plans and return what it considers the best query plan.
Since query optimizers are imperfect, database users and administrators sometimes need to manually examine and tune the query plans to achieve better performance. A set of tools (e.g. IBM Visual Explain) are typically provided to display the query access plan. Some databases provide the function “EXPLAIN PLAN” to allow the user to preview and evaluate the query plan generated by the DBMS. A “Plan table” that returns the cost and time for executing the query may also be provided.
However, the EXPLAIN PLAN function is not very helpful in performance tuning. For instance, some databases may return a Plan table with many unknown results (e.g., execution time), making it difficult to judge which query statement yielded the most efficient query plan. In addition, different “join” sequences may result in different query plans that lead to different execution times, which are often unknown.
In addition, manual performance tuning can be extremely time-consuming and difficult. Complex Structured Query Language (SQL) queries may require different tuning methods to be performed. In the likely event of a data volume change, the entire tedious tuning process will have to be repeated to evaluate the change in performance. Moreover, different databases have different properties, and the programmer has to be familiar with the specific properties of the underlying database in order to perform the performance tuning.
Therefore, there is a need for an improved database query optimization framework that addresses the above-mentioned challenges.