This specification relates to data processing, in particular, to processing recursive statements.
In some query languages, a predicate is a function defined by one or more statements that maps one or more input parameters to true or false. A predicate operates on an associated relation and returns true or false depending on whether a tuple defined by the input parameters occurs in the associated relation.
A relation is a set of tuples (t1, . . . , tn), each tuple having n≥1 data elements ti. Each element ti represents a corresponding value, which may represent a value of a corresponding attribute having an attribute name. Relations are commonly thought of as, represented as, and referred to as tables in which each row is a tuple and each column is an attribute.
Some query languages support recursive statements, which are statements that reference their own output. An example query language that supports recursive statements is Datalog. The following example statement can be written in Datalog:
f(i):-i=1; (i=2, f(1)); (i=3, f(2))
This statement in Datalog recursively defines a predicate f having input parameter i. The predicate can be expressed as f(i), or for brevity, f, whose meaning will be apparent from the context. The :- operator defines the predicate f(i) to have the body “i=1; (i=2, f(1)); (i=3, f(2))”. A semicolon represents disjunction, i.e., logical “or,” and a comma represents conjunction, i.e., logical “and.” For clarity, logical “and” will occasionally be spelled out as an explicit “and” operator.
The semantics of the predicate f(i) in Datalog is that the body of f(i) is evaluated to compute the associated relation for f(i). The associated relation is the smallest set of values i such that the body of f(i) is satisfied, i.e., evaluates to true. Then, when a value for i is provided as input to the predicate f(i), the predicate evaluates to true if the value occurs in the associated relation for f(i) and false otherwise. For example, f(1) evaluates to “true” because the term “i=1” in the body defines the associated relation to include the tuple (1). Therefore, because the associated relation includes (1), f(1) evaluates to “true.” Evaluation of predicates is typically performed by an evaluation engine for the query language implemented by software installed on one or more computers.
The relation over which f(i) is evaluated may be specified within the body of the predicate. In this example, the body of f(i) defines a relation having a set of singleton tuples, e.g., {1, 2, 3}. However, the relation over which f(i) is evaluated may alternatively be specified by another predicate or may be explicitly defined. For example, the relation may be defined by a table in a database.
The meaning of evaluating a recursive predicate with a free variable is to compute the least fixed point of the predicate. The least fixed point is a relation having a set of tuples that is a subset of all other fixed points of the predicate. Evaluation engines that evaluate recursive predicates can use a number of different procedures for finding the least fixed point.
Some methods for finding the least fixed point of a recursive predicate recast the predicate into a number of nonrecursive evaluation predicates. The evaluation predicates are then evaluated in sequence until a least fixed point is reached. In general, recasting a recursive predicate into a number of nonrecursive evaluation predicates may be referred to as “flattening” the recursion.
An evaluation engine for a particular query language can recast a recursive predicate as follows. A first nonrecursive predicate is defined as an empty relation. In addition to the empty relation, a sequence of subsequent nonrecursive predicates are defined according to the body of the recursive predicate. In doing so, the evaluation engine replaces each recursive term with a reference to a previous nonrecursive predicate. Logically, the number of nonrecursive predicates that can be generated is unbounded. However, the evaluation engine will halt evaluation when the least fixed point is reached.
The evaluation engine then evaluates the nonrecursive predicates in order and adds resulting tuples to the associated relation for the predicate. The evaluation engine stops when a nonrecursive predicate is reached whose evaluation adds no additional tuples to the relation. The final result is the associated relation for the recursively defined predicate.
Using this procedure, evaluating each successive predicate regenerates all of the results that have already been generated. Thus, this approach is sometimes referred to as “naive evaluation.”
For simplicity, predicates in this specification will generally be represented logically and may not necessarily correspond to a language construct of any particular query language. However, the implementation of the illustrated logical predicates by an evaluation engine is normally straightforward for query languages that support recursive predicates.
Thus, to illustrate naive evaluation, an evaluation engine can recast the predicate above into the following nonrecursive evaluation predicates.
f0(i):-{ }
f1(i):-i=1; (i=2, f0(1)); (i=3, f0(2))
f2(i):-i=1; (i=2, f1(1)); (i=3, f1(2))
f3(i):-i=1; (i=2, f2(1)); (i=3, f2(2))
. . .
Or, for brevity, the evaluation predicates may be represented as:
f0(i):-false
fn+1(i):-i=1; (i=2, fn(1)); (i=3, fn(2))
At first glance, this notation may look like a recursive definition, but it is not. This is because the subscripts of the predicates denote different nonrecursive predicates occurring in the potentially unbounded sequence of predicates. In other words, the predicate fn+1 is not recursive because it references fn, but not itself. The evaluation engine then evaluates the nonrecursive predicates in order to find the least fixed point.
Naive evaluation of f(i) starts by evaluating f0(i), which is defined to be the empty relation. Naive evaluation then proceeds as illustrated in TABLE 1.
TABLE 1Pred-PreviousCurrenticaterelationrelationCommentsf1(i){ } (Empty){1}The relation of f0(i) is empty,represented as “{ }”.Thus, f1 evaluates to:i = 1; (i = 2, 1 is in { }); (i = 3, 2 is in{ }).Neither 1 nor 2 are in the emptyrelation. Thus, the only tuple that isgenerated is 1 from the term i = 1.f2(i){1}{1, 2}f2 evaluates to:i = 1; (i = 2, f1(1)); (i = 3, f1(2))ori = 1; (i = 2, 1 is in {1}); (i = 3, 2 is in{1})The value 2 is not in the relation {1}.Thus, the only tuples that aregenerated are 1 from the term i = 1and 2 from the term i = 2, f1(1).f3(i){1, 2}{1, 2, 3}f3 evaluates to:i = 1; (i = 2, f2(1)); (i = 3, f2(2))ori = 1; (i = 2, 1 is in {1, 2}); (i = 3, 2 isin {1, 2})All three terms are true, thus thetuples 1, 2, and 3 are generated.f4(i){1, 2, 3}{1, 2, 3}f4 evaluates to:i = 1; (i = 2; f3(1)); (i = 3, f3(2))ori = 1; (i = 2, 1 is in {1, 2, 3}); (i = 3; 2is in {1, 2, 3})Again, all three terms are true, thusthe tuples 1, 2, and 3 are generated.
After f4(i) is evaluated, no additional tuples are added to the relation. Therefore, the evaluation engine can determine that the least fixed point has been reached.
Languages that allow recursive predicates generally require the recursive predicates to be monotonic. That is, on each iteration, evaluation of the predicate results only in tuples being added to the relation, but never removed from the relation.
One example class of non-monotonic recursive predicates is predicates with a recursive call under an odd number of negations. Recursive calls under an odd number of negations can result in an evaluation engine computing a relation that never converges.
For example, the following recursive predicate is non-monotonic:
f(i):-(i=1, not f(2)); (i=2, f(1))
The problem is the single negated recursive call “not f(2)”. Naive evaluation of this recursive predicate would result in the following cycling of tuples in the relation:
f0(i):={ }
f1(i):={1}
f2(i):={1, 2}
f3(i):={2}
f4(i):={ }
f5(i):={1}
f6(i):={1, 2}
f7(i):={2}
f8(i):={ }
. . .
Therefore, evaluation engines for recursive predicates generally reject recursive predicates if they include a recursive call under an odd number of negations.
However, predicates in which every recursive call is under an even number of negations are monotonic. Thus, the following predicate is monotonic:
f(i):-not (i=1, not f(2)); (i=2, f(1))
because every recursive call is under an even number of negations, i.e., 2 negations for f(2) and zero for f(1).
Another prior art procedure for evaluating recursive predicates is referred to as “semi-naive evaluation.” When using semi-naive evaluation, an evaluation engine flattens the recursion of the predicate in a different way than naive evaluation. In particular, the evaluation engine defines a delta predicate whose associated relation is defined to include only the new tuples found on each iteration. The least fixed point is found when an iteration is reached in which the delta predicate's associated relation is empty.
For example, an evaluation engine can use semi-naive evaluation to find the least fixed point of the following example predicates:
f(i):-i=1; i=2; i=3
g(i):-i=1; exists(j: j=i−1, f(i), g(j))
The term “exists(j: j=i−1, f(i), g(j))” has an existential quantifier. A term having an existential quantifier may be referred to as an existential term. This existential term asserts that there is a j such that j is equal to i−1 and that i is in the relation of f(i) and that j is in the relation of g(j).
An evaluation engine can flatten the recursive predicate g(i) by defining the following nonrecursive evaluation predicates:
δg0(i):-{ }
g0(i):-{ }
δgn+1(i):-(i=1; exists(j: j=i−1, f(i), δgn(j))), not gn(i)
gn+1(i):-gn(i); δgn+1(i)
As mentioned above, semi-naive evaluation uses an evaluation predicate that is referred to as a delta predicate. A system can generate the delta predicate by replacing the recursive call in the original predicate with a nonrecursive call to the previous delta predicate. The system then generates a conjunction of the result with a negation of the predicate from the previous iteration. Thus, the delta predicate is defined to include only new tuples found in a particular iteration of the evaluation. Thus, the extra term “not gn(i)” at the end of the definition for δgn+1(i) indicates that previously found tuples do not satisfy the delta predicate for δgn+1(i).
Evaluation of the example recursive predicate using semi-naive evaluation is illustrated in TABLE 2. An evaluation engine need not compare a previous relation to a current relation as was done for naive evaluation. Rather, the evaluation engine can halt when the first empty delta predicate is found.
TABLE 2Pred-icateRelationCommentsδg0(i){ }Empty by definitiong0(i){ }Empty by definitionδg1(i){1}δg1(i) evaluates to:(i = 1; exists(j: j = i − 1, f(i), δg0(j))), not g0(i)or(i = 1; exists(j: j = i − 1, i is in {1, 2, 3}, j is in { })),i is not in { }Thus, {1} is generated.g1(i){1}g1(i) evaluates to:g0(i); δg1(i)ori is in { }; i is in {1}The only value for i that satisfies this predicate is 1.Thus, {1} is generated.δg2(i){2}δg2(i) evaluates to:(i = 1; exists(j: j = i − 1, f(i), δg1(j))), not g1(i)or(i = 1; exists(j: j = i − 1, i is in {1, 2, 3}, j is in {1}))),i is not in {1}The only value for i for which j = 1 and j = i − 1 is 2.Thus, {2} is generated.g2(i){1, 2}g2(i) evaluates to:g1(i); δg2(i)ori is in {1}; i is in {2}The values for i that satisfy this predicate are 1 and 2.Thus, {1, 2} is generated.δg3(i){3}δg3(i) evaluates to:(i = 1; exists(j: j = i − 1, f(i), δg2(j))), not g2(i)or(i = 1: exists(j: j = i − 1, i is in {1, 2, 3}, j is in {2}))),i is not in {1, 2}The only value of i for which j = 2 and j = i − 1 is i =3. Thus, {3} is generated.g3(i){1, 2, 3}g3(i) evaluates to:g2(i); δg3(i)ori is in {1, 2}; i is in {3}The values for 1 that satisfy this predicate are 1, 2, and3. Thus, {1, 2, 3} is generated.δg4(i){ }δg4(i) evaluates to:(i = 1; exists(j: j = i − 1, f(i), δg3(j))), not g3(i)or(i = 1: exists(j: j = i − 1, i is in {1, 2, 3}, j is in {3}))),i is not in {1, 2, 3}There is no value of i such that i is in {1, 2, 3}, andfor which j = 3 and j = i − 1.Thus, the delta predicate is empty and evaluationstops.
Using delta predicates to find the least fixed point becomes more complicated when a predicate expression includes multiple recursive calls.
For example, the following recursive predicate includes multiple recursive calls in a single conjunction:
f(i):-i=1; (i=2, f(1)); (i=3, f(1)); (i=4, f(3)); (i=5, f(2), f(4))
In this example predicate, the last disjunct includes both f(2) and f(4). This term asserts that i is equal to 5 and that 2 and 4 are both in the relation f.
If using a single delta predicate, tuples generated by evaluating f(2) and f(4) may not appear in the same delta predicate at the same time. Thus, in order to flatten the recursion of this recursive predicate, the evaluation engine needs to generate multiple delta predicates, one delta predicate for each recursive call in each disjunct that includes multiple recursive calls. For example, the evaluation engine can generate the following evaluation predicates for f(i):
δf0(i):-{ }
f0(i):-{ }
δ0fn+1(i):-i=1; (i=2, δfn(1)); (i=3, δfn(1)); (i=4, δfn(3)); (i=5, δfn(2), fn(4))
δ1fn+1(i):-i=1; (i=2, δfn(1)); (i=3, δfn(1)); (i=4, δfn(3)); (i=5, fn(2), δfn(4))
δfn+1(i):-δ0fn+1(i); δ1fn+1(i)
fn+1(i):-fn(i); δfn+1(i)
By this new definition, the delta predicate δfn+1(i) at each iteration is a disjunction of multiple sub-delta predicates δ0fn+1(i) and δ1fn+1(i). Each of the sub-delta predicates uses a previous delta predicate on a different recursive call within the same disjunction f.
Semi-naive evaluation can fail to produce correct results when a recursive call is negated. For example, the following recursive predicate includes a negated recursive call:
f(i):-i=1; i=2; i=3
g(i):-f(i), not exists(j: f(j), j<i, not g(j))
Naive evaluation would flatten the recursion of g(i) to the following evaluation predicates:
g0(i):-{ }
gn+1(i):-f(i), not exists(j: f(j), j<i, not gn(j)
Naive evaluation of g(i) would then progress as illustrated in TABLE 3.
TABLE 3Pred-PreviousCurrenticaterelationrelationCommentsg1(i){ }{1}g0 is defined to be empty. Thus, g1evaluates to:f(i), not exists(j: f(j), j < i, not g0(j))ori is in {1, 2, 3} and there does notexist a value of j such that j is in{1, 2, 3} and j < i, and j is not in { }The only value of i for which this istrue is 1. Thus, the only tuple that isgenerated is {1}.g2(i){1}{1, 2}g2 evaluates to:f(i), not exists(j: f(j), j < i, not g1(j))ori is in {1, 2, 3} and there does notexist a value of j such that j is in{1, 2, 3}, j < i, and j is not in {1}The only values of i for which this istrue are 1 and 2. Thus {1, 2} isgenerated.g3(i){1, 2}{1, 2, 3}g3 evaluates to:f(i), not exists(j: f(j), j < i, not g2(j))ori is in {1, 2, 3} and there does notexist a value of j such that j is in{1, 2, 3}, j < i, and j is not in {1, 2}The only values of i for which this istrue are 1, 2, and 3. Thus, {1, 2, 3}is generated.g4(i){1, 2, 3}(1, 2, 3}g4 evaluates to:f(i), not exists(j: f(j), j < i, not g3(j))ori is in {1, 2, 3} and there does notexist a value of j such that j is in{1, 2, 3}, j < i, and j is not in {1, 2, 3}The only values of i for which this istrue are 1, 2, and 3. Thus, {1, 2, 3}is generated.
Because g3(i) and g4(i) have the same relation, naive evaluation ends after correctly producing the tuples {1, 2, 3}.
Semi-naive evaluation of g(i), however, produces incorrect results because of the negated recursive call. Semi-naive evaluation flattens the recursion of g(i) into the following evaluation predicates:
δg0(i):-{ }
g0(i):-{ }
δgn+1(i):-f(i), not exists(j: f(j), j<i, not δgn(j)), not gn(i)
gn+1(i):-gn(i); δgn+1(i)
Semi-naive evaluation of g(i) would then progress as illustrated in TABLE 4.
TABLE 4Pred-icateRelationCommentsδg0(i){ }Empty by definitiong0(i){ }Empty by definitionδg1(i){1}δg1(i) evaluates to:f(i), not exists(j: f(j), j < i, not δg0(j)), not g0(i)ori is in {1, 2, 3} and there does not exist a value of j suchthat j is in {1, 2, 3}, j < i, and j is not in { } and i is notin { }This is true only when i is 1. Thus, {1} is generated.g1(i){1}g1(i) evaluates to:g0(i); δg1(i)ori is in { } or i is in {1}This is only true when i is 1. Thus, {1} is generated.δg2(i){2}δg2(i) evaluates to:f(i), not exists(j: f(j), j < i, not δg1(j)), not g1(i)ori is in {1, 2, 3} and there does not exist a value of j suchthat j is in {1, 2, 3}, j < i, and j is not in {1} and i is notin {1}This is true only when i is 2. Thus, {2} is generated.g2(i){1, 2}g2(i) evaluates to:g1(i); δg2(i)ori is in {1} or i is in {2}This is only true when i is 1 or i is 2. Thus, {1, 2} isgenerated.δg3(i){ }δg2(i) evaluates to:f(i), not exists(j: f(j), j < i, not δg2(j)), not g2(i)ori is in {1, 2, 3} and there does not exist a value of j suchthat j is in {1, 2, 3}, j < i, and j is not in {2} and i is notin {1, 2}There are no values of j and i that satisfy thisexpression. This is because “not δg2(j)” forecloses jbeing 2, but not j being 1, and thus there does exist a j,j = 1, that is less than i = 3. Thus, no values of i satisfythe delta predicate.Therefore, δg3(i)'s relation is empty.
On the last iteration of semi-naive evaluation, δg3(i)'s relation is empty. At this point, semi-naive evaluation ends, after incorrectly producing only {1, 2} and without producing 3. This occurs because the example predicate uses multiple values from the same recursive call to g(i). In other words, in order for the delta predicate δg3(i) to produce {3}, δg2(i) would need to produce {1, 2}. But the nature of delta predicates for semi-naive evaluation is such that 1 and 2 will never been in the same δgn(i) delta predicate.