When processing a query, database management systems use statistical information in the form of column histograms that describes column data distributions in order to generate a good query plan for execution. While the query specifies what data is to be accessed, the query plan specifies how the data is to be accessed. The process of generating the query plan is referred to as optimization.
A histogram is a collection of non-overlapping intervals of the column values and a summary of the data distribution within each interval. Generally, histograms are adequate for good selectivity estimation of equality (for example, Country=‘Germany’) and range predicates (e.g. Age<21) particularly when the data distribution is relatively uniform within the interval.
On the other hand, histograms may not provide good selectivity estimations of more complex predicates. For example, histograms can be insufficient for obtaining good selectivity estimates of equality and range predicates if the columns involved in the predicates are not independent or the data is not uniform. These histogram limitations may cause the database optimizer to generate a poor query plan, resulting in slow execution.