Electronic databases provide extensive amounts of organized data that may be searched according to particular criteria to retrieve data pertinent to a requestor. Typically, the data is searched and retrieved using queries. Queries can be specified in different formats, according to the database query language that is being used. A query often includes one or more predicates, which are the parts of the query specifying a condition under which a statement is either true or false, and is thus used to specify conditions under which data is sought. The data in the database is typically organized in multiple tables, and is searched according to the conditions of the predicates.
Compound predicates allow multiple conditions to be specified, and may require that multiple tables of the database be searched. In particular, a compound OR predicate of a query that references more than one table requires that all predicates be evaluated as false before the entire compound OR predicate can be evaluated as false. For example, a query in Structured Query Language (SQL) format, including a compound OR predicate, is shown below:
SELECT *
FROM T1, T2
WHERE T1.C1=T2.C1
AND (T1.C2=‘A’ OR T2.C3=5)
All data is selected to be searched from two tables, T1 and T2, the WHERE statement equates the C1 columns of the two tables T1 and T2, and the compound OR predicate is (T1.C2=‘A’ OR T2.C3=5). This query thus indicates that the corresponding rows from both tables T1 and T2 should be provided as results to the query if either condition of the compound predicate is satisfied.
A typical process of implementing the query shown above is indicated diagrammatically in the table diagram of FIG. 1. Assuming T1 is the first table to be accessed in the join sequence, the first row of table T1 is retrieved, and the predicate T1.C2=‘A’ is applied. The first row in T1 qualifies for this condition with the ‘A’ in column C2, and this row is joined to the first row of table T2 on the join column C1 so that the corresponding rows from both tables will be provided as results. Next, the second row of table T1 is retrieved, and the predicate T1.C2=‘A’ is applied. The second row qualifies again with the ‘A’ value in column C2, and is joined to the second row of table T2 on the join column C1. (The second row also qualifies against the second predicate T2.C3=5, in column C3.) Next, the third row of table T1 is retrieved. The value at column C2 is not equal to ‘A’, but due to the OR condition the third row cannot be discarded and must be joined to the third row of table T2 to evaluate the second predicate T2.C3=5. The third row of table T2 qualifies for the second predicate condition due to the value 5 in column C3. Next, the fourth row of table T1 is retrieved. The value at column C2 is not equal to ‘A’, but again due to the OR condition the fourth row cannot be discarded and must be joined to the corresponding row in table T2 to evaluate the second predicate. The second predicate then also fails, and the fourth rows can thus be discarded. This same process continues for all rows of T1 (of which only four are shown in FIG. 1).
Thus, using this example compound predicate, all rows of table T1 must be accessed and all corresponding rows of table T2 must be joined to table T1 so that table T2 rows are either retrieved or checked, even if one or more rows are to be discarded. This is because there is no single Boolean term condition (such as an AND condition without OR conditions) that can qualify or disqualify rows from table T1 by accessing only table T1. A “non-Boolean” term, typically indicated by an OR condition, is present, and this non-Boolean term is fully uneven (herein referred to simply as “uneven”), meaning that the non-Boolean term spans multiple tables, i.e., data from (across) multiple tables must be checked against the non-Boolean condition to disqualify any rows This occurs, for example, when the term requires access to different tables based on predicates on opposite sides of an OR condition. Uneven non-Boolean terms can create a need for more table accesses and processing, and thus create longer access and processing times for queries.
Such uneven non-Boolean term are typically not a concern for searching small tables in a database, in which the extra accesses are not burdensome, or if the predicates are not very restrictive, when a majority of rows of the tables qualifies by accessing only one table. However, performance can degrade significantly as more tables are referenced by a compound predicate, or if the predicates are restrictive, and the tables involved in the query are large.
There are known ways to improve performance when the compound OR structure of a query contains predicates that can be isolated to a single table, allowing restrictive predicates to be applied as early as possible in the table join sequence. For example, if the predicate is of the form AND (T1.C2=‘A’ OR T2.C3=5 AND T1.C2=‘B’), then the predicate AND (T1.C2=‘A’ OR T1.C2=‘B’) could be redundantly added so that it could be applied to T1 as this table is first accessed. Less disqualifying rows of T1 will thus need to be joined to table T2, since a row in table T1 can be qualified for either of the added predicate conditions before needing to join and access table T2. However, this solution does not resolve the problem when the OR branches are uneven, i.e., when the compound OR structure of a query contains predicates whose access spans multiple tables, as in the first example, above.
Accordingly, what is needed is an apparatus and method for easily improving the performance of queries including non-Boolean term conditions that require access to more than one table. The present invention addresses such a need.