This disclosure relates generally to query execution planning in a database, and more specifically, techniques for estimating selectivity.
Before query execution, most database management systems employ an optimizer engine to determine the most efficient method to access requested query data. The optimizer thus generates the best execution plan, which is the plan with the lowest cost among all other candidate plans in cost-based optimizers. The estimator is a component of the optimizer that estimates the overall cost of each candidate execution plan so as to allow the optimizer to choose the query execution plan with the lowest estimated cost. One measure of the estimator to help achieve the cost estimation objective is selectivity estimation, which internally calculates the percentage of rows that will be selected in a row set based on the query request.
Selectivity is particularly important when queries are multidimensional (have multiple attributes/data columns), as accurate estimates are increasingly more difficult to achieve when the query task becomes more complex. Inaccurate estimation may result in the selection of a plan that is very costly, leading to an inefficient database management system. Estimating selectivity based on multidimensional queries may still not be used in many products due to the calculation complexity and bad results.