Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
One area that has been a fertile area for academic and corporate research is that of improving the designs of the “query optimizers” utilized in many conventional database management systems. The primary task of a query optimizer is to choose the most efficient way to execute each database query, or request, passed to the database management system by a user. The output of an optimization process is typically referred to as an “execution plan,” “access plan,” or just “plan” and is frequently depicted as a tree graph. Such a plan typically incorporates (often in a proprietary form unique to each optimizer/DBMS) low-level information telling the database engine that ultimately handles a query precisely what steps to take (and in what order) to execute the query. Also typically associated with each generated plan is an optimizer's estimate of how long it will take to run the query using that plan.
An optimizer's job is often necessary and difficult because of the enormous number (i.e., “countably infnite” number) of possible query forms that can be generated in a database management system, e.g., due to factors such as the use of SQL queries with any number of relational tables made up of countless data columns of various types, the theoretically infinite number of methods of accessing the actual data records from each table referenced (e.g., using an index, a hash table, etc.), the possible combinations of those methods of access among all the tables referenced, etc. An optimizer is often permitted to rewrite a query (or portion of it) into any equivalent form, and since for any given query there are typically many equivalent forms, an optimizer has a countably infinite universe of extremely diverse possible solutions (plans) to consider. On the other hand, an optimizer is often required to use minimal system resources given the desirability for high throughput. As such, an optimizer often has only a limited amount of time to pare the search space of possible execution plans down to an optimal plan for a particular query.
Queries which the optimizer can handle often include Join operations of various types. For example, an Inner Join between the two tables will result in only the records matching certain selection criteria to be returned to a user. An outer join, such as a Left Join of Table A and Table B (according to a set of selection criteria) will return all matching records as well as those records in Table A not having a match in Table B. The matching records will have the values of all their fields populated by the data from Table A and Table B. However, the non-matching records from Table A will have a NULL, or other default value, in the fields from Table B. Another outer join, such as a Right Join of Table A and Table B will also return all the matching records but, unlike the Left Join, will return those records in Table B not having a matching record in Table A. Another join type is a Left Exception Join of Table A and Table B. In this instance, the result set will consist of those records in Table A for which no matching record in Table B exists. A Right Exception Join is available as well. Within the field of SQL queries, a number of different but synonymous terms are used. For purposes of simplicity, within the present description, a Left Join and Left Outer Join are equivalent terms and Right Join and Right Outer Join are equivalent terms also. Furthermore, an Inner Join and EquiJoin are equivalent terms as well.
The presence of an outer join, such as a Left Join or Right Join, or an Exception Join in a query plan can reduce the level of optimization achievable by the optimizer. Specifically, the presence of these undesirable join types prevents, or severely limits, the ability of the optimizer to perform predicate movement, ORDER BY movement, GROUP BY movement, and Join enumeration when developing an optimal query plan.
In partial recognition of this problem, previous optimizers have included a Join-Downgrade feature in which, under specific circumstances, an Outer Join or Exception Join is replaced with an Inner Join. These optimizations, however, do not address all occurrences of Outer and Exception Joins.
As described earlier, a Left Join of two tables selects records matching a particular selection criteria and any records in the first table that do not have a match in the second table. Accordingly, the Left Join can be considered to consist of an Inner Join with the selection criteria and an Exception Join of all the records in the first table except those also in the second table. In other words, a Left Join is the union of an Inner Join portion and an Exception Join portion.
In some instances, the logic of an SQL SELECT statement may exclude all the records that would be included by operation of the Exception Join portion. For example, the logic of the SELECT statement may, in addition to the Left Join, test some value from the second table against a non-NULL value. Those records which satisfy the Inner Join portion of the Left Join have appropriate values in their fields and may or may not pass the test; however, any records satisfying the Exception Join portion of the Left Join (i.e., those records in the first table that do not have a match in the second table) will have a NULL value and will fail the test. As a result, the selection criteria of the SELECT statement effectively removes all the records returned as part of the Exception Join portion of the Left Join.
Under these specific conditions, an optimizer can perform a Join-Downgrade to replace the Left Join with an Inner Join. There remains the need, however, for optimizers to implement a general solution to replace every Outer and Exception Join encountered while developing a query plan with a type of Join that facilitates optimization strategies such as predicate movement (up or down), ORDER BY movement, GROUP BY movement, and Join enumeration.