Query optimizers in relational database management systems rely on statistics to accurately choose an efficient execution plan. Users are responsible for identifying which columns and indexes on which to collect statistics and then periodically recollecting these statistics to refresh them. Over time, statistics often become stale as the corresponding data is subjected to updates. The process of recollecting statistics usually requires scanning and sorting all of the indexed or column data and is thus resource intensive, especially for large tables. As a result, users wish to limit recollections to only when necessary, namely when the data demographics have changed significantly. Unfortunately, it is often difficult for users to manually determine the need for recollections. This is particularly true in the case of periodic batch load operations that can be done as frequently as once per day.
Each batch load operation (or a series of them) has the potential of significantly altering data demographics and hence may require a statistics recollection after it completes. Because most users aren't able to determine the impact to demographics, they either resort to recollecting after every load operation, some of which are probably unnecessary, or they skip recollections altogether which then results in stale statistics.