Technical Field
Embodiments disclosed herein relate to query optimization in a relational database. More specifically, techniques are disclosed for adjusting duplicate-skipping methods when executing a database query.
Description of the Related Art
Relational database management systems (DBMS) use relational techniques for storing and retrieving data in a database. Users request access to data in the database by issuing a database access request, such as a query statement. A query statement is a set of commands for retrieving, inserting, or modifying data from the database.
Many database query statements for data retrieval require removal of duplicate results by syntax (e.g., SET FUNCTION DISTINCT and GROUP BY in SQL). Other database query statements do not require duplicate values to be included in the result set (e.g., sub-query statements). When executing a query, a DBMS may remove duplicates through various methods involving scanning an index of the database for distinct values. A database index is a data structure used to quickly locate data without having to search every row in a database table. In a relational database, the index is a copy of select columns of data from the table that can be searched efficiently.
One current approach for removing duplicate values from a result set is to skip index keys with duplicate values in leading index columns. Using this approach, DBMS scans an index for distinct values. When the DBMS identifies a distinct value, the DBMS skips over duplicates of that value until identifying the next distinct value. However, this approach incurs significant overhead when scanning a range of index keys scanned which contains few duplicate values. Further, the method used to skip duplicate values remains the same regardless of the proximity of two distinct keys within the index. This results in unnecessarily incurred overhead, and thus, inefficient performance.