Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
The Structured Query Language (SQL) interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO). The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data.
One such operation is known as a DISTINCT operation, which causes the elimination of duplicate values in specified sets of data. In the SQL query
SELECT (DISTINCT DEPT)
FROM DATA_TABLE
DEPT is a key, which designates the column upon which the DISTINCT operation will be applied. Execution of the above query returns the set of unique departments from the table DATA_TABLE. Even if a particular department number appears in fifty rows of the table, it will only appear once in the result set of the query.
DISTINCT operations are frequently associated with aggregatation operations, such as AVG, COUNT, and SUM. Oftentimes, it is desirable to perform multiple DISTINCT operations for multiple columns in a table. Unfortunately, most current DBMSs restrict a user to only one DISTINCT key in the SELECT clause in any given query. In order to perform multiple DISTINCT operations, a single query is not allowed. Rather, the user must compose and submit separate queries to the DBMS for each one of the multiple DISTINCT keys. This leads to inefficiencies for the user, who is required to write multiple queries, and for the DBMS because multiple queries must be executed. Each query is associated with a cost of going to the DASD and retrieving the requested information. This cost is measured in computing resources and time expended.
One approach to resolving this problem has been implemented in DB2 UNO®, developed by International Business Machines of Armonk, N.Y. Here, the user is allowed to specify multiple DISTINCT keys in a query, such as:
SELECT COUNT(DISTINCT EMPLOYEE), COUNT(DISTINCT DEPT)
FROM DATA_TABLE
GROUP BY (LAB_LOC)
This query is broken down and rewritten into two table expressions:
SELECT COUNT(DISTINCT EMPLOYEE)
FROM DATA_TABLE
GROUP BY (LAB_LOC);
SELECT COUNT(DISTINCT DEPT)
FROM DATA_TABLE
GROUP BY (LAB_LOC)
Each table expression is then executed in the same manner as would two separate queries. The result sets are then merged and returned to the user.
While the above described system allows the user to submit a query with multiple DISTINCT keys, it is still highly inefficient because it requires the system to execute multiple queries. As stated above, this burdens the system and increases the response time. Moreover, the query submitted by the user is broken down and rewritten during a pre-execution period known to those skilled in the art as a bind time. During bind time, execution parameters are determined and bound, i.e., saved, for use later during execution or runtime.
Thus, there exists a need for a method and system that allows for executing a query having multiple DISTINCT key columns. The method and system should be efficient and cost effective. The present invention addresses these needs.