Query optimization is important in relational database systems that deal with complex queries against large volumes of data. Unlike earlier navigational databases, a query on a relational database specifies what data is to be retrieved from the database but not how to retrieve it. Optimizing a query against a relational database is not as important in transaction-oriented applications where only a few rows are accessed because the query causes the data to be accessed using a highly selective index. In decision support and data mining applications, where the space of possible solutions is large and the penalty for selecting a bad query is high, optimizing a query to reduce overall resource utilization can provide orders of magnitude of overall performance improvement.
When the relational database system stores subsets of table data on individual processing modules, the execution plan for the query includes instructions to each virtual processor module specifying how that module should contribute. An optimizer programmed to determine the most efficient execution plan can use known statistics regarding the data, e.g. metadata, to compare different plans. Resource conserving plans can be identified with greater statistical confidence if the distribution of data across the multiple processing modules is determined more accurately.