When an information retrieval system receives a query, that query is typically optimized to ensure its efficient execution. Such optimization involves deciding an order of evaluating parts of a query, and a choice of method for evaluating each part. It is very hard, however, to predict whether one way of evaluating the query is better than another. In traditional database technology, this problem is addressed by making cost estimates, which approximate the resources needed for one particular way of evaluating a query. The optimization engine then chooses an evaluation order and method with a small estimated cost.
These cost estimates are typically based on the characteristics of the query evaluation engine, and also on statistics about the data that is being queried. Cost estimates are often inaccurate because they are based on inaccurate estimates of the size of intermediate results—the size of a result is very important to know when choosing an evaluation method. Furthermore, these cost estimates are highly dependent on the characteristics of the particular system on which they run.