1. Field of the Invention.
This invention relates in general to database management systems performed by computers, and in particular, to the optimization of queries that include at least one GROUP BY operation that computes a ROLLUP function, a GROUPING SETS function, or stacked GROUP BY operations in a relational database management system.
2. Description of Related Art.
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
RDBMS software typically has the capability of analyzing data based on particular columns of a table. For example, rows can be grouped according to columns defined in a GROUP BY clause of a query. The column names in a SELECT clause are either a grouping column or a column function. Column functions return a result for each group defined by the GROUP BY clause.
A grouping query can include a standard WHERE clause that eliminates non-qualifying rows before the groups are formed and the column functions are computed. A HAVING clause eliminates non-qualifying rows after the groups are formed; it can contain one or more predicates connected by ANDs and ORs, wherein each predicate compares a property of the group (such as AVG(SALARY)) with either another property of the group or a constant.
The GROUPING SET operator extends the GROUP BY operation to simultaneously specify the computation of multiple GROUP BYs in a single GROUP BY operation. When the GROUPING SET operator is used, a NULL value in a non-null grouping column denotes that the particular column is collapsed in the aggregation. If a grouping column (c) is nullable, a GROUPING operator (GROUPING(c)) is required to distinguish between the NULL group and a column collapsed in the aggregation. Used in conjunction with GROUPING SETS, the GROUPING operator returns a value which indicates whether or not a row returned in a GROUP BY answer set is a row generated by a GROUPING SET that excludes the column represented by the expression. The argument can be of any type, but must be an item of a GROUP BY clause. The result of the function is set to one of the following values:
1xe2x80x94The value of expression in the returned row is a null value, and the row was generated by a super-group. That is, the argument is collapsed in the aggregation.
0xe2x80x94The value of the expression in the returned row represents a non-system generated value of the group (which may be null and indicates that the argument is not collapsed in the aggregation.
ROLLUP operations can also be specified in the GROUP BY clause of a query. ROLLUP operations are shorthand for GROUPING SETS that represent common sets of GROUP BY operations that are required for common queries for online analytical processing (OLAP). ROLLUP grouping produces a result set containing the regular grouped rows and sub-total rows. For example, ROLLUP can provide the sales by person by month with monthly sales totals and an overall total.
However, the current state of the art does not optimize predicates in queries that are performed after the GROUP BY operations. Thus, there is a need in the art for improved optimization techniques for such queries.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for optimizing database queries, wherein the query is analyzed to determine whether the query includes at least one GROUP BY operation that computes at least one of the following: (1) a ROLLUP and (2) a GROUPING SET, and when it does, the query is rewritten to optimize one or more predicates that are applied after the GROUP BY operation. The query is also analyzed to determine whether the query includes at least one GROUP BY operation that computes two or more stacked GROUP BY operations, and when it does, the query is rewritten to collapse the stacked GROUP BY operations into a single GROUP BY operation.