The present invention relates generally to database operations and, more specifically, to providing sample data for database query size estimation.
Database management systems enable users to query large collections of information. There are many execution plans that a database management system could potentially employ to answer a given query. Query optimization is the process in which the database management system estimates the cost of a number of candidate plans, and chooses the one with the lowest cost.
An important step in database query cost estimation is the estimation of sizes of intermediate query results that often influence the cost of a plan associated with the database query. The database system aims to accurately estimate these sizes with a relatively small effort, without actually executing the query. A number of techniques are often used for estimating such sizes. One general class of techniques for estimating query result sizes is referred to as sampling. In sampling, the system obtains a sample from one or more of the participating tables and performs a variant of the query over the sample. The query result size over the sample is scaled appropriately to estimate the size of the query over the full data set.
Sampling can also be used for various kinds of estimation tasks that are separate from query optimization. For example, if a rough estimate of a query result is all that is needed, processing a suitably sized sample is often used to provide a good balance between accuracy and computation time.