1. Technical Field
This invention generally relates to computer systems, and more specifically relates to apparatus and methods for improving access to data in a computer database through optimization of the SELECT query in Structured Query Language (SQL).
2. Background Art
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database.
Retrieval of information from a database is typically done using queries. A database query typically includes one or more predicate expressions interconnected with logical operators. A predicate expression is a general term given to an expression using one of the four kinds of operators (or their combinations): logical, relational, unary, and boolean, as shown in FIG. 2. A query usually specifies conditions that apply to one or more columns of the database, and may specify relatively complex logical operations on multiple columns. The database is searched for records that satisfy the query, and those records are returned as the query result.
A primary type of SQL query is the SELECT statement. The SELECT statement specifies data to be retrieved from a database table. The SELECT statement has an optional DISTINCT specifier. When the DISTINCT specifier is used with a SELECT statement, it causes the database manager to return only unique column data specified by the SELECT statement by discarding duplicate rows. A duplicate row is when the corresponding data of the SELECTed column has duplicate data. In the known art, a query containing a DISTINCT statement often causes the database unnecessary delay. This occurs when database manager executes the SELECT statement. The execution of the SELECT statement joins tables together in a temporary data structure. When data is duplicated in rows of the table, the joining of the tables for the SELECT statement causes joining of duplicate records that will be discarded anyway by the DISTINCT specifier. This causes an undue burden on system resources and increases access delay to database queries. Without a way to reduce database query time to improve system performance, the computer industry will continue to suffer from excessive delays in database accesses that include the DISTINCT specifier.