Relational and object-relational database management systems store information in tables of rows in a database. To retrieve data, queries that request data are submitted to a database server, which computes the queries and returns the data requested.
Queries submitted to the database server must conform to the syntactical rules of a particular query language. One popular query language, known as the Structured Query Language (SQL), provides users a variety of ways to specify information to be retrieved.
A query submitted to a database server is evaluated by a query optimizer. Based on the evaluation, the query optimizer generates an execution plan that defines operations for executing the query. Typically, the query optimizer generates an execution plan optimized for efficient execution. The optimized execution plan may be based on a rewrite of the query.
A common type of query that is optimized is a query that contains a subquery whose join condition involves the NOT IN/ALL operator (NOT IN is equivalent to !=ALL). In data-warehouses with reporting applications, such queries and subqueries are usually evaluated on very large sets of data. Thus, it is critical to make such queries scale in any SQL execution engine. When such queries are not optimized using anti-join, the subquery is executing an operation that is effectively a Cartesian product, which is quite inefficient.
One common technique for optimizing these kinds of queries is anti-join unnesting. In anti-join unnesting, a subquery operand of an NOT IN/ALL operator is either merged with the containing “outer query” or an inline view is created for the subquery and the columns in the join condition of the NOT IN/ALL operator are used to form a join condition of an anti-join. To illustrate anti-join unnesting, the following query Q1 is transformed into Q2. Note that in this example both the columns T1.x and T2.y contain only non-null values.
Q1:SELECT T1.cFROM T1WHERE T1.x NOT IN (SELECT T2.y        FROM T2        WHERE T2.z > 10);Q2:SELECT T1.cFROM T1, T2WHERE T1.x A= T2.y and T2.z > 10;
Query Q1 is rewritten by merging the subquery operand of the NOT IN operator of Q1 into Q1's outer query to produce query Q2. Query Q2 contains the anti-join operator T1.x A=T2.y, which is based on the join columns (i.e. T1.x, T2.y) of the NOT IN operator in query Q1. The anti-join operator specifies the join condition T1.x A=T2.y. A condition that compares columns between tables, is hereafter referred to as a join condition. A joining column is a column being compared, by an operator in a join condition, to a column of another table. Query Q2 may be executed far more efficiently than query Q1. Note that the anti-join operator A= is non-standard SQL and is used here for the purpose of illustration only.
The anti-join is an asymmetric join, where a row of the “left table” is returned only if it does not match (i.e. does not satisfy the connecting condition) with any row in the “right table”. The term “left” is used to designate the table whose rows are returned by an anti-join operation, and not to designate the table's position within an expression. Similarly, the term “right” is used to designate the table whose rows are to be matched (or not) to a left table by an anti-join operation, and not to designate the table's position within an expression. Nevertheless, the notation T1.x A=T2.y is used to represent an anti-join, where T1 is the table on the left of the anti-join and T2 is the table on the right of the anti-join.
The term table refers generally to any set of rows or tuples stored in a database table or computed for an expression, such as a query or subquery. For example, the rows returned by the NOT IN/ALL subquery of Q1 can be referred to as a table.
In Q2, under the semantics of an anti-join, for each row of T1, the join condition T1.x=T2.y is evaluated, and if no match is found with any row of T2, then that row of T1 is returned. The semantics of evaluating the NOT IN/ALL subquery in Q1 is identical to the semantics of the anti-join summarized below.                1. If T2 contains no rows after the application of the filter predicate, then return all the rows of T1 and terminate.        2. For each row of T1, return the row, if T1.x has no match with any row of T2.        
The anti-join unnesting transformation of Q1 to Q2 is an example of one form anti-join unnesting in which a subquery is merged into the outer query. In another form, a subquery is converted into an inline view of the outer query. The transformation of Q3 to Q4 illustrates this latter form. Again in this example, both the columns T1.x and T2.y contain only non-null values.
Q3:SELECT T1.cFROM T1WHERE T1.x NOT IN (SELECT T2.y        FROM T2, T3        WHERE T2.z = T3.w          and T2.k > 10);Q4:SELECT T1.cFROM T1,  (SELECT T2.y AS Y   FROM T2, T3   WHERE T2.z = T3.w    and T2.k > 10) VWHERE T1.x A= V.y;
Query Q4 is rewritten by converting the subquery operand of the NOT IN operator of Q3 into inline view V of Q4. Query Q4 contains the anti-join operator T1.x A=T2.y, which is based on the join columns (i.e. T1.x, T2.y) of the NOT IN operator in query Q3. The anti-join operator specifies the join condition T1.x A=T2.y.
Unfortunately, anti-join unnesting for NOT IN/ALL subqueries may only be performed when a certain restriction, referred to herein as the no-NULL restriction, is met. The no-NULL restriction requires that both operands of the anti-join condition are free of NULL values for every row in the left and right tables. For example, query Q1 satisfies the no-NULL restriction only when column T1.x does not contain any NULL values, and no row in T2 that satisfies the predicate filter condition T2.z contains a NULL value in column T2.y.
The no-NULL restriction bars anti-join unnesting for a large proportion of NOT IN/ALL subqueries; therefore the optimizer is forced to choose a sub-optimal plan. Clearly, there is a need for techniques and mechanisms for performing anti-join unnesting when the no-NULL restriction is not satisfied.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.