A database management system may receive a query, search a database based on the query, and provide data resulting from the search. A query, however, specifies only desired data and not a manner in which a database management system should search the database for the desired data. A database management system may therefore be required to select from multiple strategies for searching a database in response to a received query. For example, in response to a query specifying multiple joins, a database management system must determine an order in which to execute the multiple joins.
A database management system often includes an optimizer to select a most appropriate (e.g., fastest) search strategy for responding to a given query in a given situation. The optimizer selects the strategy based at least in part on optimizer statistics associated with the database. The optimizer statistics are determined based on data stored in the database.
Optimizer statistics may specify any characteristics of data stored in the database. For example, an optimizer statistic may specify a number of distinct values that are associated with a given field of a given table and/or a statistical distribution of such values. The foregoing optimizer statistics may be determined by reading each row of the given table, which can be unsuitably resource- and time-consuming if the given table is large. For large database tables, these statistics may be determined by reading only a sample of all rows in the table and extrapolating the statistics based on the sampling. However, if the extrapolated optimizer statistics are too inaccurate, the optimizer may select an inefficient search strategy based thereon.
The efficiency of a database system may therefore be improved by increasing the speed and/or accuracy with which optimization statistics are determined. Systems are therefore desired for efficient determination of optimizer statistics.