Various types of database management systems may contain large quantities of data, to which a relatively small amount is added frequently. One example is seen in data warehouse applications employed for online analytical processing, data mining and similar uses, which may be employed to retain voluminous quantities of transactional data in a single table. At the same time, query performance may be an important factor to such systems.
Queries over large tables may take considerable time, and accordingly the efficiency of the queries is an important consideration. A database component known as a query optimizer may be responsible for producing optimized plans of query execution. To perform its role, the query optimizer may rely on various statistical measures describing aspects of the data stored in the database. For example, the query optimizer may rely on statistics indicative of data distribution, data volume and so forth to determine an optimized plan for performing the query. Inaccurate query optimization statistics may cause the query optimizer to produce plans that are deficient in some way.
At the same time, recalculating statistics for the entire table may be prohibitively expensive in terms of time and computing resources. There may also be various risks associated with a full recalculation, such as the potential, however small, that query behavior will be unexpectedly and detrimentally affected. At the same time, if the amount of data added to the system is relatively small, the query optimization statistics might not change in a manner that would alter the performance of the query optimizer. It may therefore be advantageous to avoid full recalculations of the optimization statistics.