The present invention relates to database management systems, and more specifically, to executing expressions within database query languages. Databases come in many flavors. One popular form is a relational database management system (RDBMS), such as DB2™ system, which is manufactured by International Business Machines Corporation of Armonk, N.Y.
The RDBMS is responsible for handling all requests for access to the database where the data itself is actually stored, thereby shielding the users from the details of any specific hardware implementation. Using relational techniques, the RDBMS stores, manipulates and retrieves data in the form of table-like relations typically defined by a set of columns or attributes of data types and a set of rows (i.e. records or tuples) of data. The columns may further comprise restrictions on their data content (i.e. valid domains) and may be designated as a primary key or unique identifier for the relation or a foreign key for one or more other relations.
One very common language for dealing with RDBMSs is the Structured Query Language (SQL). SQL includes both data definition operations and data manipulation operations. To maintain data independence a query (i.e. a set of SQL commands) instructs the RDBMS what to do but not how to do it. Thus, the RDBMS includes a query processor for generating various query plans of execution and choosing the least expensive plan with respect to execution costs. Due to the high-level nature of relational expressions and a variety of implementation techniques, automatic query optimization is possible and often necessary to ensure more efficient query processing.
Often, a database application may require the creation of a “view” for the data in given relations or tables. A view provides an alternative way of looking at the data in one or more base tables. It may, for example, contain only selected columns from the table. The creation of the view or selections of data from the view may involve the processing of multiple “CASE” expressions.
In SQL, CASE expressions (CEXP), similar to any other programming language, are an intelligible way to express complex conditional clauses, which allows a selection to be made of one sequence of statements out of many possible sequences. A CEXP can be seen as a list of n logical expressions, 1—0, . . . , 1_(n−1) and another list of n+1 actions, a—0, . . . , a_n. An action represents a sequence of statements to be executed, and a logical expression represents a condition that can evaluate to be either true or false. Following this terminology, the semantics of a CEXP can be described as follows: the logical expressions 1_i must be evaluated in increasing order from i=0 to i=n−1 until one logical expression satisfies, or all of the logical expressions fail, such that: if i:0i<n:j:0j<i: (false(1_j) true(1_i)) then a_i is executed; if i:0i<n: false(1_i) then a_n is executed. It should be noted that only one action can be executed and that all the other actions being omitted.
The syntax of a CEXP in a SQL statement can be expressed as follows:
SELECT  CASE    WHEN I_0 THEN a_0    WHEN I_1 THEN a_1    ....    WHEN I_(n−1) THEN a_(n−1)    ELSE a_n  ENDFROM ...WHERE ....;The CEXPs can appear either in the SELECT clause, in the WHERE clauses, or in any place where a scalar expression can appear.
When CEXPs contain subqueries, commercial DBMSs such as DB2 tackle CEXPs by breaking the subquery semantic block so that the subquery semantic blocks are independently executed before evaluating the CEXPs. This type of approach allows the query compiler to speed up the query processing by considering a larger search space for join ordering. However, the payoff is that the query execution plan generated by the compiler breaks the CEXP semantics, the execution of subquery semantic blocks is out of the control of logical expression, i.e., subquery semantic blocks are executed no matter of the result of the logical expression, and, as a consequence, incorrect results may be produced. Also, in some scenarios, the fact that all subquery sematic blocks are being executed might lead to performance degradations, independently of the optimizations introduced by the query compiler.
For example, say that we have the following statement:
SELECT  CASE    WHEN C1 > 0 THEN ( SELECT SUM(C2/C1) FROM T1)    ELSE ( SELECT SUM(C2+C1) FROM T1)  ENDFROM T;This CEXP has one logical expression 1—0=C1>0, and two actions: a—0=(SELECT SUM(C2/C1) FROM T1), and a—1=(SELECT SUM(C2+C1) FROM T1).
At present, a query compiler, such as the DB2 query compiler, may break the CEXP into several join operations, as illustrated in FIG. 1. For every tuple projected by the table scan on T, a—0 and a—1 are executed through the nested loop joins (NLJN). After that, the condition expressed by 1—0 is checked and the result from a—0 is selected if it evaluates to true. If this is not the case, the result from a—1 is used.
This CEXP execution can cause an error when a—0 does not have to be executed because of its dependency on C1, that is, if C1 is 0, when a—0 is executed, C2 will be divided by zero. Moreover, the execution of the query may be sub-optimal since all the actions are executed irrespective of the result from evaluating 1—0. Thus, there is a need for mechanisms that execute CEXPs containing subqueries, such that the semantics of the CEXP are correct.