The present invention relates generally to database systems, and more particularly to aggregate queries in database systems.
A database is a collection of information. Relational databases are typically illustrated as one or more two-dimensional tables. Each table arranges items and attributes of the items in rows and columns. Each table row corresponds to an item (referred to as a record), and each table column corresponds to an attribute of the item (referred to as a field). In a relational database a collection of tables can be related (or "joined") to each other through a common attribute or "key". The common key enables information in one table to be automatically cross-referenced to corresponding information in another table.
A complex search may be performed on a database with a "query". A query specifies a set of criteria (e.g., the names of customers from a particular state) to define information for a database program to retrieve from the database. An aggregate query is a query which requests information concerning a group of records in the aggregate. For example, in a database which stores credit card transactions, an aggregate query may request the total purchases for all customers in a particular state. Each aggregate query may include a set of criteria to select records (e.g., grouping by state), and an operation to perform on the group of selected records (e.g., summing the transaction amounts).
Typical operations for aggregate queries include counting (e.g., determining the total number of transactions in a particular state), summing (e.g., determining the total of the purchases within the state), averaging (e.g., determining the average value of all purchases within the state), and finding minimum and maximum values (e.g., finding the largest and smallest purchases within the state).
To perform an aggregate query, a conventional database program examines every record in the database to determine whether or not the record matches any criteria. It constructs a query table from the records that match the criteria. Then the program performs the required operation over the appropriate fields from each record in the query table.
In some circumstances, a conventional database program may be unacceptably slow when performing an aggregate query. For example, a data warehouse for a credit card company might include a record of every transaction for every customer for the entire history of the credit card company. In such a data warehouse it would require an impractical amount of time or processing power to perform an aggregate query.
Several techniques have been developed to reduce the time required to process aggregate queries. One technique is to pre-compute partial answers to related queries. For example, a transaction total may be pre-computed in each individual zip code. Then, if the user requests the transaction total for a specific state, the database program adds the pre-computed transaction total for each zip code within the specific state. Because there are fewer zip codes than transactions, the transaction totals for the zip code may be added together more quickly than the transaction amount in the transaction records. Unfortunately, the advantages of this technique may be lost if the user submits a query to which no partial answer has been prepared.