A database management system (DBMS) is a computer based system that stores data in tables and retrieves such stored data. Queries are used to obtain data from these tables based on characteristics/parameters set forth in the query. It is often desirable to obtain quick approximate answers to a query from large databases. Sampled queries are one means of obtaining these answers.
Sampled queries are queries performed on sampled tables. A sampled table is the result of randomly selecting a specified number of rows from the table, rather than all rows that match a selection criteria. Sampled queries are used to quickly gather an approximate profile of data within a large table. If a sufficiently large sample size is used, trends in the data can be examined by sampling the data instead of scanning all the data.
For the sampling to be effective from an efficiency point of view, the sampling done early in the query should be considered. However, sampling does not commute with many of the operations in the query. Even for a query of a single relation, sampling may be problematic if the query is more complex.
In a relational model of data, data entries are arranged into columns of values forming one or more multicolumn tables referred to as relations. Relations typically represent an entity, storing attributes of the entity in each record. A join of two relations is a method of combining multiple relations to obtain a consolidated view of the relations. The join may involve matching relevant values of a column of one table with values of a column of a second table. When there are two or more tables in the query, the decision to push the sampling to the access of the base relation in order to reduce the set of rows considered early may be problematic. Many methods, such as, dividing the sampling rate and sampling the individual relations often results in few or no records output. Another approach evaluates the complete join result and then takes the appropriate sample of this result, however, this uses a large number of computing resources to complete join result.