A database is a collection of logically related data arranged in a predetermined format, such as in tables that contain rows (tuples) and columns (attributes). To access the content of a table in the database, queries according to a standard database query language (e.g., Structured Query Language or SQL) are submitted to the database system. A query can be issued to insert new entries into a table of a database (such as to insert a row into the table, modify the content of the table, or to delete entries from the table).
As the technology of storage devices and database software have improved, the capacity of database systems have also increased dramatically. An application of database systems is data warehousing, where data from various sources are collected and stored in the data warehouse. The amount of data that can be stored in the data warehouse can be immense. To process information within such data warehouses, on-line analytical processing (OLAP) is typically performed. Usually, on-line analytical processing involves the calculation of aggregates on large data sets. Examples of aggregates include the calculation of a sum of values of a given attribute, the calculation of an average, the calculation of a minimum, the calculation of a maximum, the counting of a number of rows, and so forth. Aggregates are also used for data mining applications. Data mining involves building statistical models or finding patterns in large data sets.
Aggregates are typically calculated in conjunction with group-by operations. A group-by operation is specified by an SQL SELECT statement that includes a GROUP BY clause. The GROUP BY clause specifies a grouping function that groups output results according to one or more grouping attributes (columns). The output of the group-by operation is a set of groups of rows, where each group contains an aggregate of multiple rows (on a given attribute) that share a common value of the grouping attribute(s).
In a database system containing very large relational tables, the result of a group-by operation can produce a large number of rows. Typically, it is difficult to understand a result set (of a group-by operation) that contains a large number of rows. As a result, outputs produced by conventional database systems for group-by operations may not be very useful for OLAP, data mining, or other statistical or analytical algorithms. Generally, software for performing OLAP, data mining, or other statistical or analytical algorithms are unable to efficiently analyze detailed outputs of aggregate operations. Conventionally, users have to spend great time and effort to summarize detailed aggregation outputs for use by such software.