Computer database systems manage the storage and retrieval of data in a database. A database comprises a set of tables of data along with information about relations between the tables. Tables represent relations over the data. Each table comprises a set of records of data stored in one or more data fields. The records of a table are also referred to as rows, and the data fields of records in a table are also referred to as columns.
A database server processes data manipulation statements or queries, for example, to retrieve, insert, delete, and update data in a database. Queries are defined by a query language supported by the database system. To enhance performance in processing queries, database servers use indexes to help access data in a database more efficiently. Typical database servers comprise a query optimizer to generate efficient execution plans for queries with respect to a set of indexes. Query optimizers generate execution plans based on histograms and other statistical information on the column(s) of the table(s) referenced in the queries. Query optimizers typically rely on histograms on selected columns to estimate selectivities of queries.
Database servers typically create histograms on the columns of tables over which indexes are constructed. Database servers may also create histograms on columns that do not have indexes to enhance the accuracy of estimates by query optimizers. Creating histograms, however, can incur significant costs in time and memory, particularly for large databases. Although data values of selected columns may be sampled, for example, to generate approximate histograms, the accuracy of such histograms depends on the size of the sample. Sampling too little data limits the ability of query optimizers to generate relatively accurate execution plans while sampling too much data consumes more time and memory.