Enterprises are increasingly capturing, storing, and mining a plethora of information related to communications with their customers. Often this information is stored and indexed within databases. Once the information is indexed, queries are developed on an as-needed basis to mine the information from the database for a variety of organizational goals: such as planning, analytics, reporting, etc.
In fact, many enterprises today have extremely large databases with frequent and voluminous transactions. For example, consider a large retail establishment that has updates for millions of transactions a day to its central database from a plurality of regional databases.
These large databases also process a variety of enterprise operations, which may be time critical to the enterprise, such as reports, analytics, and/or even complex database mining queries. Large databases with many transactions typically require optimization mechanisms to ensure that operations are efficiently processed in a timely fashion and that resources of the database (memory, storage, processors) are properly loaded and balanced. To do this, optimization services may inspect operations and a current state of the database and then develop plans to process the operations. The optimization service typically relies on gathered (sometimes referred to as collected) statistics about the database and its information when devising plans.
The gathered statistics are produced by statistics gathering services. These services determine a total number of rows, histograms for the values of columns, etc. The statistics gathering services may take a considerable amount of time to process and consume considerable resources to produce a full set of statistics for the database.
Moreover, plans developed by optimizing services will be skewed if the statistics of the database are not properly updated to reflect a current state of the database. However, to acquire current statistics the database may be have to be unduly loaded by the processing of the statistics gathering services, and the plans of the optimizing service delayed until the processing of the statistics gathering services complete.
Of particular concern are statistics on date columns. Columns with other data types may not become out-of-date as quickly as date columns. For instance, the amount of sales of one item may vary somewhat from one day to the next but probably not enough to significantly affect the statistics—these statistics may only need to be re-gathered once a week or even less often. However, for a date column, especially for the transaction date column, goes from zero rows with that value into the millions after the data for that day is loaded. The statistics on these date columns needs to re-collected at least once a day after the load, if not more often.
It becomes a catch 22 scenario for database administrators of large transactional database environments. To have optimal plans up-to-date statistics for the database are need and yet up-to-date statistics require gathering the statistics after loading new data to the database and delaying the optimizer, which defeats the very benefits of the plans produced by the optimizer.
Previous techniques have the optimizer extrapolate date statistics if the date statistics were gathered at earlier time. This requires changing the optimizer that is part of a large database system and waiting until that version of the optimizer is released. This does not provide relief for users of the current database release. Moreover, when the release is available, the optimizer will have its method of extrapolating data statistics. While this method may be suitable for a wide variety of date columns, there may be cases where a different extrapolation method is more appropriate. In those cases, a user would have to re-gather statistics to obtain up-to-date statistics since they could not change the extrapolation method.
Thus, it can be seen that a more easily configurable and more efficient mechanisms for generating and using database date statistics are needed.