One type of database is a Relational DataBase Management System (RDBMS) that uses a Structured Query Language (SQL) interface. A database typically has many tables, and each table typically has multiple records and multiple columns. A table in a database may be accessed using an index, which is an ordered set of references (e.g., pointers) to the records in the table.
A query may be described as a request for information from a database based on specific conditions. A query typically includes one or more predicates. A predicate may be described as an element of a search condition that expresses or implies a comparison operation (e.g., A=3).
The database typically collects and uses statistics. With complete and relevant statistics in place, an optimizer of a database may generate a better query execution plan for executing the query. Most Frequent Values (MFVs) is one statistic that may be used by the database. MFVs identify which values occur in a table often and with an approximate estimation of their frequency. This information, along with dispersion information (e.g., number of unique values), and other statistics, help the database perform cost model estimations.
For example, assume that a table has one billion rows, and one column (col_1) has ten unique values, and one of the unique values occurs 99% of the time.
If most frequent value information were missing from the statistics, a predicate on col_1=“value that occurs 99% of the time” would be assumed to have uniform distribution and, thus, the cost model is likely to underestimate the number of rows returned, which may impact the execution plan of the query with this predicate.
Similarly, a predicate on col_1=“one of the values that occurs less than 1% of the time” may result in the cost model overestimating the number of rows returned, which may also impact the execution plan of the query with this predicate.