A database is a collection of logically related data arranged in a predetermined format, such as in tables that contain rows and columns. To access the content of a table in the database, queries according to a standard database query language (such as the Structured Query Language or SQL) are submitted to the database. A query can be issued to insert new entries into a table of a database (such as to insert a row into the table), modify the content of the table, or to delete entries from the table.
Examples of SQL statements include INSERT, SELECT, UPDATE, and DELETE. The SELECT statement is used to retrieve information from a database and to organize information for presentation to a user or to an application program. A SELECT statement can include a GROUP BY clause, which specifies a grouping function to group the output results according to one or more attributes specified in the GROUP BY clause.
Starting with SQL-99 (also referred to as SQL3), further types of group-by operations have been defined, including group-by with grouping sets, group-by with rollup, and group-by with cube. Each of such group-by operations involve multiple grouping sets. Examples of SQL statements that specify a group-by on multiple grouping sets include the following:                SELECT C1, C2, SUM(C3)        From Table a        GROUP BY GROUPING SETS((C1, C2), C1);        SELECT C1, C2, SUM(C3)        FROM TABLE A        GROUP BY ROLLUP(C1, C2);        SELECT C1, C2, SUM(C3)        FROM TABLE A        GROUP BY CUBE(C1, C2);        
Each of the group-by operations specified by the above SQL statements involve multiple levels of grouping operations. For example, in the first example statement that contains the GROUP BY GROUPING SETS clause, two grouping sets are specified: group-by on C1, C2, and group-by on C1. The group-by on C1, C2 is considered to be a lower group-by operation than the group-by on C1.
The group-by on the grouping set C1, C2 is calculated from the base table A. For more efficient computation, the group-by on C1 can be calculated from the result of the group-by on C1, C2, rather than from the base table A. This is illustrated with the following sequence of SQL statements. To calculate the group-bys on the multiple grouping sets (C1, C2) and C1, the group-by on grouping set C1, C2, specified by the following statement, is first performed.                SELECT C1, C2, SUM(C3)        FROM A        GROUP BY C1, C2;        
The results are stored in a spool file named SPOOL1. The database system then computes a group-by on C1 from the spool file SPOOL1 (rather than from the base table A), as specified by the following statement:                SELECT C1, ?, SUM(C3)        FROM SPOOL1        GROUP BY C1;        
Although the approach illustrated above enables efficient computation of group-bys on grouping sets at multiple levels, such a technique cannot be used if the SQL statement specifies that an aggregate be calculated on distinct values of a particular attribute, such as in the following SQL statement:                SELECT C1, C2, SUM(DISTINCT C3)        FROM A        
GROUP BY GROUPING SETS ((C1, C2), C1);
The aggregate function SUM(DISTINCT C3) produces a sum of the distinct values of the attribute C3 in table A. Thus, for example, if table A has four rows in which the attribute C3 has the following values: 10, 10, 10, 20, then SUM(DISTINCT C3) produces a sum of 30 (10+20). This is distinguished from performing the sum aggregate on all values of C3, SUM(C3), which produces a sum of 50 (10+10+10+20). The DISTINCT option can also be specified with other types of aggregates, such as AVG, COUNT, and so forth.
If a SQL statement specifies an aggregate on distinct attribute values for group-bys on multiple levels of grouping sets, then the computation of a higher level group-by from the result of a lower level group-by as conventionally done does not produce accurate results.