Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
In one particular instance, many database management systems collect statistics regarding query execution, typically on a table-by-table basis. These statistics are then used by a query optimizer when optimizing future queries. Even though these statistic collection applications may run in the background, they consume valuable system resources. Furthermore, when the data stored in a table changes, the statistics collected for that table tend to become stale, and not as useful for the purposes of query optimization. Conventional techniques for statistics collections usually define a “staleness” threshold such that if the data in a table is over 15% stale, for example, then new statistics are collected for that table.
One problem that arises in highly volatile tables, however, is that data may become stale (due to inserts, deletions, and updates) very frequently and result in statistics being rebuilt for such tables hundreds of time per hour. The overhead associated statistics collections thus increases and reduces system performance. Increasing the staleness threshold to decrease the frequency of rebuilds adversely impacts tables that are not as volatile. Some database designs require administrators to manually mark specific tables as “volatile” to disable statistics collections on those tables. In such instances, however, optimization suffers due to statistics being missing or overly stale absent further manual intervention by an administrator.
Thus, there is a need in current database systems to have a more flexible approach to statistics collections that more efficiently utilizes system resources.