The invention relates to methods and apparatus for optimizing queries in a relational database system.
A database is a collection of information. A relational database is a database that is perceived by its users as a collection of tables. Each table arranges items and attributes of the items in rows and columns respectively. Each table row corresponds to an item (also referred to as a record or tuple), and each table column corresponds to an attribute of the item (referred to as a field, an attribute type, or field type).
To retrieve information from a database, the user of a database system constructs a query. A query contains one or more operations that specify information to retrieve from the database. The system scans tables in the database to execute the query.
A database system can optimize a query by arranging the order of query operations. The number of distinct values for an attribute is one statistic that a database system uses to optimize queries. When the actual number of distinct values is unknown, a database system can use an estimate.
An accurate estimate of the number of distinct values for an attribute is useful in methods for optimizing a query involving multiple join operations. A database system can use the estimate in methods that determine the order in which to join tables. An accurate estimate of the number of distinct values for an attribute is also useful in methods that reorder and group items.