As database systems continue to grow in size and complexity, it becomes ever more crucial to provide efficient and fast database query services. To that end, some database systems implement query optimization functionality to determine the most efficient execution strategy for each query. The execution strategy may be chosen based on statistical information on data. Alternatively or additionally, it may be chosen based on structural and/or functional features of a database system.
Generally, a database includes multiple tables, each holding various records, and each record including a number of fields of information. A common type of query on multiple tables is a “join” query. In a join query, multiple tables are searched to find those tuples of records that match the same criteria. One type of join query is an equijoin query where multiple tables are searched to find those tuples of records in which the specified fields are equal. For example, a first table in a database may include records of individuals, where each record includes fields holding the name of a person and his/her movie interests. The same database may hold a second table of records of individuals along with their favorite sports. One equijoin query on these two tables may aim to find those pairs of records of a same person that indicate both movie interests and favorite sports of a person.
Generally, the computational resources required for executing an equijoin query grows with the size of the tables involved. This becomes a significant concern for large tables. In order to reduce the execution complexity of database queries, the output size (e.g., cardinality) of a query may be estimated prior to determining whether to proceed with the query or not. Query size estimation may be performed by sampling each table and running the query on the samples in place of the tables.