1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to using data in materialized query tables as a source for query optimization statistics.
2. Description of the Related Art
The use of a Relational Database Management System (RDBMS) is well known in the art. Relational databases are organized into tables that are comprised of rows and columns of data, wherein the rows are tuples and the columns are attributes. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives, for semi-permanent storage.
The use of a Structured Query Language (SQL) interface to access data in the RDBMS is also well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO). The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL.
The definitions for SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content, but the method that the RDBMS uses to actually find the required information in the tables is left up to the RDBMS. Typically, there will be more than one method that can be used by the RDBMS to access the required data. Each of these methods is a query execution plan (QEP).
The QEP is a compiled run-time structure used for executing the SQL statement. The QEP is the path the RDBMS uses to get to the data that SQL statements request. For example, an SQL statement might search an entire table space, or it might use an index. The QEP is the key to determining how well an SQL statement performs. The data associated with the QEP is stored in the catalog, or optionally in a plan table.
The RDBMS software uses various data, including statistics in a RDBMS catalog, during bind time to determine which QEP to use. A utility, called RUNSTATS, updates the RDBMS catalog with statistics on table spaces, indexes, tables, and columns. Additionally, when an SQL statement is processed during a bind phase, a QEP is determined for the statement.
Typically, when there are multiple QEPs to choose from, a QEP is selected based on a detailed analysis of the execution costs of each alternative QEP. The RDBMS will select an optimal QEP in order to minimize the computing time or resources used and, therefore, the cost of doing the query.
Certain QEP operations reduce the number of records seen by subsequent operations by applying predicates, and one of the most important tasks of a cost-based optimizer is the estimation of the number of rows, or cardinality, of intermediate results after predicates are applied. Each conjunct, or predicate, of a search condition (e.g., a WHERE or HAVING clause) is assigned a selectivity, which effectively represents the probability that the predicate is true for a given row. A given selectivity estimate is typically derived from statistics about the database, such as the number of distinct values of a referenced column.
A query optimizer relies heavily on accurate statistics on the tables, rows and columns referenced in a query in determining the optimal QEP for a query. In practice, statistics can be missing, of insufficient type, of insufficient detail, or inaccurate, for a variety of reasons. The difference in a QEP chosen with relevant and accurate statistics can perform orders of magnitudes better than a QEP chosen without such statistics. It is therefore essential that a query optimizer exploit all possible sources of statistics.