1. Field of the Invention
The invention generally relates to computer database systems. More particularly, the invention relates to techniques for optimization of database queries including grouped aggregation functions.
2. Description of the Related Art
Databases are well known systems for storing, searching, and retrieving information stored in a computer. The most prevalent type of database used today is the relational database, which stores data using a set of tables that may be reorganized and accessed in a number of different ways. Users access information in relational databases using a relational database management system (DBMS).
Each table in a relational database includes a set of one or more columns. Each column typically specifies a name and a data type (e.g., integer, float, string, etc.), and may be used to store a common element of data. For example, in a table storing data about patients treated at a hospital, each patient might be referenced using a patient identification number stored in a “patient ID” column. Reading across the rows of such a table would provide data about a particular patient. Tables that share at least one attribute in common are said to be “related.” Further, tables without a common attribute may be related through other tables that do share common attributes. A path between two tables is often referred to as a “join,” and columns from tables related through a join may be combined to from a new table returned as a set of query results.
A query of a relational database may specify which columns to retrieve data from, how to join the columns together, and conditions (predicates) that must be satisfied for a particular data item to be included in a query result table. Current relational databases require that queries be composed in query languages. A commonly used query language is Structured Query Language (SQL). However, other query languages are also used.
Typically, query languages provide aggregation functions to enable some computation or summarization of query results. SQL aggregation functions include COUNT (counts the number of values in a column), SUM (sums all values stored in a column), AVG (computes the average value in a group), MIN (determines the smallest value in a column), and MAX (determines the largest value in a column). Aggregation functions are often included in grouped queries, meaning queries where the results are grouped according to values in one or more columns. In SQL, such grouped queries are composed using a GROUP BY clause.
Once composed, a query is executed by the DBMS. Typically, the DBMS interprets the query to determine a set of steps (hereafter referred to as a “query plan”) that must be carried out to execute the query. However, in most cases, there are alternative query plans that can be carried out to execute a given query. Thus, the DBMS often includes a query optimizer, which selects the query plan that is likely to be the most efficient (i.e., requiring the fewest system resources, such as processor time and memory allocation).
Query optimizers typically operate by evaluating database statistics, including column statistics. That is, the query plan is selected based on statistical characteristics of the data in the fields (i.e., columns) required for the query. Thus, effective query execution requires optimization of queries that include grouped aggregation functions.