Information stored in a relational database may be accessed by using a query that specifies the information sought. To that end, Structured Query Language (SQL) is a standardized language used to define queries as a combination of one or more statements. Relational Database Management System (RDBMS) software often includes an SQL interface and a query optimizer for translating SQL statements into an efficient Query Execution Plan (QEP). A QEP defines the methods and sequences used for accessing tables, the placement of sorts, where predicates are applied, and so on. That is, a QEP specifies a plan for accessing the information sought.
Given the size and complexity of many relational databases, there may be many feasible alternative QEP's, even for a simply query. Accordingly, it is the role of the query optimizer to determine the best of the alternatives by modeling the execution characteristics of each one and choosing a single QEP that most closely satisfies some optimization goal. For example, the query optimizer may choose to minimize some estimated cost metric, such as resource consumption or elapsed time. A common factor considered in the computation of many types of cost estimates is a cardinality estimate. A cardinality estimate is an approximation of the number of rows in a table that will have to be searched for a particular QEP or a particular stage of a QEP. Basic cardinality estimation assumes that predicates are independent and values in a table are uniformly distributed.
U.S. Pat. No. 4,956,774 issued September 1990 to Shibamiya et al. discloses a method of determining and maintaining frequency statistics, thereby permitting the assumption of uniformity to be dropped. However, the possibility of statistical correlation between predicates was not addressed.
U.S. Pat. No. 5,469,568 issued November 1995 to Schiefer et al. discloses a method for computing cardinalities of joins (i.e. a multi-table) only when the join predicates were completely redundant, but did not address local (i.e. single-table) predicates and predicates with a correlation somewhere between completely redundant and completely independent. The application of multiple predicates may reduce the output stream cardinality. However, if predicates are statistically correlated, the combined filtering effect of the predicates is not simply the product of the individual filtering effects for the respective predicates. Assuming that predicates are independent (i.e. to assume no correlation) will result in an underestimate of the cardinality resulting from the application of multiple predicates.
U.S. Pat. No. 6,738,755 issued May 2004 to Freytag et al. discloses a method for incrementally estimating the cardinality of a derived relation when statistically correlated predicates are fully applied. However, Freytag et al did not disclose a method of estimating the cardinality resulting from the application of one or more partially applied predicates.
The problems of statistical correlation between predicates also apply to partially applied predicates, which may be applied against range-partitioned tables. However, partially applied predicates introduce new challenges that are not accounted for in the methods disclosed by Freytag et al. For example, a first challenge is that multiple partially applied predicates may be statistically correlated; and, a second challenge is that partially applied predicates may be statistically correlated to fully applied predicates. Previous methods of handling correlation between predicates do not provide an accurate cardinality estimate when one or more predicates are partially applied in a range-partitioned table.