1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the optimization of SQL queries in a relational database management system involving aggregate expressions and grouping clauses.
2. Description of Related Art
Relational DataBase Management Systems (RDBMS) using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American Nationals Standard Organization (ANSI) and the International Standards Organization (ISO).
In RDBMS software, all data is externally structured into tables. A table in a relational database system is two dimensional, consisting of rows and columns. Each column has a name, typically describing the type of data held in that column. As new data is added, more rows are inserted into the table. A user query selects some rows of the table by specifying clauses that qualify the rows to be retrieved based on the values in one or more of the columns.
The SQL interface allows users to formulate relational operations on the tables. One of the most common SQL queries executed by RDBMS software is the SELECT statement. In the SQL standard, the SELECT statement generally comprises the format: "SELECT &lt;clause&gt;FROM &lt;clause&gt;WHERE &lt;clause&gt;GROUP BY &lt;clause&gt;HAVING &lt;clause&gt;ORDER BY &lt;clause&gt;." The clauses generally must follow this sequence, but only the SELECT and FROM clauses are required.
Generally, the result of a SELECT statement is a subset of data retrieved by the RDBMS software from one or more existing tables stored in the relational database, wherein the FROM clause identifies the name of the table or tables from which data is being selected. The subset of data is treated as a new table, termed the result table.
The WHERE clause determines which rows should be returned in the result table. Generally, the WHERE clause contains a search condition that must be satisfied by each row returned in the result table. The rows that meet the search condition form an intermediate set, which is then processed further according to specifications in the SELECT clause.
The search condition typically comprises one or more predicates, each of which specify a comparison between two values comprising columns, constants or correlated values. Multiple predicates in the WHERE clause are themselves typically connected by Boolean operators.
The SELECT statement may also include a grouping function indicated by the GROUP BY clause. The GROUP BY causes the rows in the intermediate result set to be grouped according to the values specified in the clause.
A number of column or aggregate functions are also built into SQL, such as MAX (maximum value in column), MIN (minimum value in column), AVG (average of values in column), SUM (sum of values in column), and COUNT (number of rows).
Queries using aggregate expressions return as many result rows as there exist unique "result groups" in the source of the aggregation. A result group is defined as the set of values contained in the fields of a row corresponding to the list of columns or expressions specified in the GROUP BY clause of the query. The value of the aggregate function is the cumulative result of applying the function to the result of the contained expression for each row having the same result group.
While there have been various techniques developed for optimizing the performance of SQL statements, there is a need in the art for optimization techniques involving aggregate expressions and grouping clauses.