This specification relates to data processing, and in particular, to processing query language statements.
Query languages operate on relations. A relation is a set of tuples (t1, . . . , tn), each tuple having n≥1 data elements ti. Each element ti represents a corresponding value, which may represent a value of a corresponding attribute having an attribute name. Relations are commonly thought of as, represented as, and referred to as tables in which each row is a tuple and each column is an attribute. However, a relation need not be implemented in tabular form, and the tuples belonging to a relation can be stored in any appropriate form.
As used in this description, a query language program includes one or more expressions, and an expression includes one or more terms. Each term is a construct of the query language that can be evaluated to a relation on free variables of the term. Terms of query languages can include predicate calls, existential quantifiers, and universal quantifiers, to name just a few examples. A predicate call operates on a relation and returns attributes of any free variables for any tuples in the relation matching any bound variables.
A disjunction defines a logical “or” between terms. For example, a disjunction between a term “a” and a term “b” can be expressed as “a or b.” Each term of a disjunction can for brevity be referred to as a disjunct.
A conjunction defines a logical “and” between terms. For example, a disjunction between a term “a” and a term “b” can be expressed as “a and b.” Each term of a conjunction can for brevity be referred to as a conjunct.
In this specification, “or” and “and”, when used in a query language expression, denote operators in a pseudocode query language. Real-world query language implementations have corresponding operators that may be expressed with the same or different syntax and which provide semantically equivalent functions. For example, conjunction in a query language can be denoted by “AND”, “&”, or “&&,” to name just a few examples. Similarly, disjunction in a query language can be denoted by “OR”, “|”, or “∥”, to name just a few examples.
Evaluation engines of query languages can perform distributive transformations of query language expressions.
For example, the following pseudocode query language expression:
exp1 (a) and (exp2 (b) or exp3 (c))
can be rewritten as the following revised version:
(exp1 (a) and exp2 (b)) or (exp1 (a) and exp3 (c))
In the revised version, the term “a” has been distributed over the disjunction of the terms “b” and “c”. The effect of this rewrite changes how an evaluation engine for the query language will evaluate the expression. Instead of evaluating the disjunction (exp2 (b) or exp3 (c)) first as would have been the case in the original version, when evaluating the revised version, the evaluation engine will evaluate (exp1 (a) and exp2 (b)) and will also evaluate (exp1 (a) and exp3 (c)) before evaluating a disjunction of the results.
Some query evaluation engines automatically transform expressions into “disjunctive normal form” by repeatedly distributing all context terms over all disjunctions until no further distributions are possible. Therefore, in disjunctive normal form, an expression is completely transformed into a disjunction of conjunctions. In other words, an expression in disjunctive normal form has the form “c_1 or c_2 . . . or c_n”, where every c_i is a conjunction of terms which itself includes no disjunctions.
Using disjunctive normal form can introduce unwanted complexity into a query language program. One problem is that transforming expressions using disjunctive normal form can result in expressions that are very large and which are therefore cumbersome and time-consuming to evaluate. Another problem is that using disjunctive normal form can cause repetition of terms that are expensive to evaluate individually.