Database statistics are often used for database query optimization. As the state-of-the art continues to advance, database systems continue to support more and more statistics gathering techniques so as to improve the ability of query optimization processes to produce better performing query execution plans. For example, in a past timeframe, some database systems began to support use of histograms to capture statistics pertaining to non-uniform data distribution. In a later release, non-sampling based statistics gathering methods were added to implementations of methods that compute the number of distinct values. In yet another release, incremental statistics gathering capabilities were added to facilitate best practices pertaining to statistics gathering over very large tables.
In some circumstances, a database system is configured to use any or all of the foregoing statistics gathering in an autonomous manner. For example, new statistics gathering processes might be automatically triggered based on how often a database object is accessed, or might be triggered based on a determination that a previously optimized plan is no longer performing optimally, or new statistics gathering might be triggered based on other performance characteristics that can be determined by the database system.
In other circumstances, software application developers and database administrators might explicitly encode then-current practices in the form of hard-coded application code or scripts. When an upgraded release of the database system is deployed, it might include upgraded statistics gathering features. If the application had explicitly encoded the older statistics gathering features into the application code base or scripts, the upgraded statistics gathering features might not be used. Having to revisit the application code and update its implementation to use the upgraded statistics gathering features is costly, which in turn makes it difficult adopt the new statistics gathering technologies.
Revising the application code to use new upgraded statistics gathering features is especially difficult when the old statistics gathering practices or statistics gathering techniques are hard-coded into the application code base. For example, in many cases, updating the application to use the new upgraded statistics gathering features requires (1) knowing/identifying that the application implements deprecated features, (2) being able to locate where the deprecated features or older best practices are encoded, and (3) having the ability to change the code. One possible approach is to hire experts to identify the source of the bad practice(s) and recommend changes to implement new statistics gathering techniques however that approach introduces several high barriers. Somebody has to recognize that there is an older statistics gathering technique and that there is a better statistics gathering solution or statistics gathering practice, then find the right experts, then be able to afford the experts, and then to be able to implement the practices into the application code base. The barriers are very high.
Worse, in many cases, users might not be aware that a problem was caused by use of older statistics gathering techniques and/or reliance on inferior statistics. What is needed are techniques that observe statistics gathering processes in the database system, automatically diagnose problems that might arise from use of the older statistics gathering methods, and then generate a report of findings and recommendations.
Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.