Most relational database management systems (DBMS) allow users to perform queries on large databases or tables using SQL (structured query language) commands. A common SQL operation is the grouping or aggregation query, which allows a DBMS user to perform computations on attributes of group members, where a "group" is understood to be a collection (e.g., persons, things) sharing a common group identifier (e.g., department, date of manufacture). For example, a user might wish to run a query against an employee database asking for a report of average department salaries grouped by department name. This query can be expressed in SQL as follows:
select dname, avg(salary) PA1 from emp PA1 group by dname.
This query requires the DBMS to select department name ("dname") and salary ("salary") columns from an employee ("emp") database table, associate the rows (that is, the selected fields of the rows) of the "emp" ruble into groups comprising salaries for employees belonging to the same department ("dname"), and compute and report average salaries for each department.
Currently, a few different methods exist for executing this group-by query. In the simplest and fastest method, where the group column is a database index, the DBMS need only traverse the memory resident binary tree built on the index, select the subtree corresponding to the group identifier, traverse the subtree accumulating data fields and report the groupings. However, this extremely efficient procedure requires that an ordered index exists on the group column(s).
It is often the case that a user wishes to perform a group-by query on a column or on a combination of columns that is not an index. In this situation, the DBMS must identify the unique values represented by the contents of the designated group column and compute and report the grouping or aggregation requested by the user. This is a far less efficient process than when the group column is a database index, and one that must take into account limited system resources, including available memory, disk space, CPU utilization and networked resources available for distributed processing.
In the Tandem.TM. NonStop.TM. SQL/MP relational database management system (DBMS), prior to the present invention the grouping or aggregation query was performed in one of two ways. If the data in the table is already sorted on the group column before the group-by query is issued, data is aggregated for each group as the database table is read row by row. When a change in group is detected, the current group along with any aggregate values for that group are returned to the user. This is a very efficient procedure. In the more likely situation where the data has not been previously sorted on the group column, NonStop.TM. SQL sorts the table on the group column(s) and then the group-by query proceeds as in the sorted case.
This sort-based procedure presents a number of problems. First, tables over which data is aggregated can be as large as 100 Gigabytes. For example, in the average department salary group-by query set out above, the employee table might comprise data for a million employees from as many as 1000 departments. Sorting this large a table is a highly CPU and I/O intensive operation that requires that rows be repeatedly written to and read from disk during the sort process, which exacts a high cost in inefficiency due to slow I/O operations. Additionally, as it is not likely that a free 100 GB to store the sorted table is available on disks local to the processor handling the query, the employee table would most likely need to be partitioned, the partitions sorted then grouped, and the groups recombined before user reports are generated, the additional steps adding overhead to the process of executing a group-by query. Grouping could also be performed on the database partitions in parallel on distributed processors, but network data transfers are far slower (generally about 1/3 slower) than local data transfers and, as a result, the reported query results would be slowed due to the network traffic required to distribute and reassemble the partitions.
Consequently, there is a need for an SQL grouping procedure that requires less disk and network I/O than the present sort-based method. Additionally, there is a need for a grouping or aggregation process that, through efficient use of computer memory resources, can run largely in memory, yielding increased execution speeds. Given a grouping process designed to run largely in memory, there is also a need for overflow strategies that efficiently utilize disk resources when sufficient memory is not available. Additionally, there is a need for an intelligent front end process to the grouping procedure that can determine based on catalog statistics for a given table whether it would be more efficient to implement the prior art sort-based grouping algorithm or the hash-based approach of the present invention.