Query optimization is important in relational database systems that deal with complex queries against large volumes of data. Unlike earlier navigational databases, a query on a relational database specifies what data is to be retrieved from the database but not how to retrieve it. Optimizing a query against a relational database is not as important in transaction-oriented databases where only a few rows are accessed either because the query is well specified by virtue of the application or because the query causes the data to be accessed using a highly selective index. In decision support and data mining applications, where the space of possible solutions is large and the penalty for selecting a bad query is high, optimizing a query to reduce overall resource utilization can provide orders of magnitude of overall performance improvement.
When a relational database system receives a query from a user, an execution plan for the query is generated. An optimizer programmed to determine the most efficient execution plan can use known statistics regarding the data stored in the database, e.g. metadata, to compare different plans. Resource conserving plans can be identified with greater statistical confidence if the distribution of unique values among a large number of records can be determined.