Relational database management systems store information in tables, where each piece of data is stored at a particular row and column. Information in a given row generally is associated with a particular object, and information in a given column generally relates to a particular category of information. For example, each row of a table may correspond to a particular employee, and the various columns of the table may correspond to employee names, employee social security numbers, and employee salaries.
A user retrieves information from and updates a database by interacting with a database application. The user's actions are converted to a query by the database application. The database application submits the query to a database server. The database server responds to the query by accessing the tables specified in the query to determine which information stored in the tables satisfies the queries. The information that satisfies the queries is retrieved by the database server and transmitted to the database application.
Various access methods may be used to retrieve data from a database. For example, the database server may perform a table scan, where the database server reads every row of the table and evaluates column values against the selection criterion specified by a query to determine which rows satisfy the selection criterion. Alternatively, the database server may use one or more indexes to improve the speed of the data retrieval process. Indexes are data structures that provide relatively fast access to a set of data based on key values.
The key values of the entries within some types of indexes have a specified ordering. For example, within the leaf nodes of a B-tree index, index entries are typically stored in the alphabetical order of the key values contained within the index entries. Further, the leaf nodes themselves may be linked to form a chain such that the key values of the index entries are ascending from left to right in the leaf node chain.
Queries submitted to the database server must conform to the rules of a particular query language. One popular query language, known as the Structured Query Language (SQL), provides users a variety of ways to specify information to be retrieved. One function available in SQL is "ordered by". When a query has an order by clause, the database server sorts the data specified in the query based on the key specified in the order by clause.
In some instances, the ordering of keys within an index may be used to avoid the need to perform the sorting operations that would otherwise be required by "order by" clauses. Specifically, if a B-tree index is built on the key that is specified in an "order by" clause of a query, then key values may be retrieved, in order, from the index entries of the B-tree index. Since the index entries within the B-tree index are alphabetically ordered, the retrieved key values will be in alphabetical order and no additional sorting will be required.
Another SQL function is "group by". The group by clause causes the data specified in a query to be divided into different groups such that all rows in a given group have the same value for the key specified in the "group by" clause. Hence, the SQL query "select state, avg(sales) from customer group by state" causes the database server to arrange the average sales into groups, where all rows in any given group have the same value for the key "state."
Another SQL function is "distinct". The distinct clause causes the elimination of redundant values in the specified set of data. For example, the SQL query "select distinct deptno from emp" returns the set of unique departments "deptno" from the table "emp." Even if a particular department number appears in fifty rows of the table "emp", it will only appear once in the result set of the query.
In conventional database systems, SQL operations such as "group by" and "distinct" are executed without regard to whether the data specified in the query will be read from a sorted source, such as a B-tree index, or an unsorted source, such as a table. However, it has been discovered that efficiency of "group by" and "distinct" operations may often be significantly improved if key values are available from a sorted source. Based on the foregoing, it is clearly desirable to provide a mechanism for improving the efficiency of "group by" and "distinct" sort operations by taking into account the existence of sorted sources when determining how to best execute a query that contains a "group by" or "distinct" clause.