This specification relates to data processing, in particular, to processing recursively defined relations.
A relation is a set of tuples (t1, . . . , tn), each tuple having n≧1 data elements ti. Each element ti is a value of a corresponding attribute, which has an attribute name. Relations are commonly thought of as, represented as, and referred to as tables in which each row is a tuple and each column is an attribute. For convenience, as is conventional, the tuples will be presented as being ordered.
In the SQL programming language, relations can be created with a CREATE TABLE statement or with a common table expression, for example. Other languages have corresponding functionality.
To introduce an example that will be used in the remainder of this specification, the graph shown in FIG. 1 can be represented by a table of nodes, illustrated in FIG. 2a, and a table of edges, illustrated in FIG. 2b. A row of the form (x,y) in the table of edges represents a directed edge in the graph from node x to node y.
The table of nodes can be created in SQL with the following commands:                create table nodes (                    node varchar );                        insert into nodes        values (‘a’), (‘b’), (‘c’), (‘d’), (‘e’);        
The table of edges can be created in SQL with the following commands:                CREATE TABLE edges (                    parent varchar,            child varchar );                        insert into edges (parent, child)        values (‘a’, ‘b’), (‘a’, ‘c’), (‘c’, ‘d’), (‘c’, ‘e’);        
A relation can also be defined recursively, as shown in the SQL recursive query example, below. Other languages have corresponding functionality.
Recursive queries are queries that reference their own output. A common use for recursive queries is calculating on graphs. For example, a recursive query on a graph might be used to determine the depth of a node, defined to be the longest length of all paths from that node to a leaf. Thus, in the graph of FIG. 1, nodes b, d and e each have a depth of 0; the depth of node c is 1; and the depth of node a is 2.
The following example is a recursive query on a graph, e.g., that of FIG. 1, that determines the depths of the nodes of the graph. In the example, the SELECT node, max (depth) FROM depth_table statement returns a depth for each node.
To calculate the depth of the nodes in the graph, a depth table depth_table is defined recursively that contains the lengths of all the paths from each node to a leaf, then the SELECT statement aggregates over the depth table:                WITH RECURSIVE depth_table (node, depth) AS        (                    /* Nonrecursive term */            /* each node with no children has a depth of 0 */            SELECT node, 0 FROM nodes n            WHERE NOT EXISTS (select * from edges e where e.parent=n.node)                        UNION                    /* Recursive term */            /* each node with a child of depth n has a depth n+1 */            SELECT e.parent, dt.depth+1            FROM depth_table dt            INNER JOIN edges e ON dt.node=e.child                        )        SELECT node, max(depth) FROM depth_table        GROUP BY node;        
The definition of depth_table is recursive because depth_table is used in the definition of itself. The WITH RECURSIVE clause defines a common table expression. This expression defines a temporary table used by a SELECT clause. Common table expressions take the form of a nonrecursive term unioned—using either UNION or UNION ALL—with a recursive term.
FIG. 3 illustrates the logic of an example prior art process that evaluates a recursive definition. For example, to evaluate a common table expressions, an SQL evaluator, implemented in computer program instructions, can perform a process implementing the logic illustrated in FIG. 3, or an optimization of that logic. The process evaluates the nonrecursive term and stores the results in a current table (300). The process branches depending on whether the common table expression uses a UNION or a UNION ALL (302). If the common table expression uses a UNION, the process discards duplicate rows from the current table (304) and then compares the current table to a previous table, which is initially empty (306). If the common table expression uses a UNION ALL, the process goes directly to comparing the current table to the previous table (306). If the two tables are the same, the process is done (308). If not, the process replaces the contents of the previous table with the contents of the current table (310). The process then evaluates the entire term, using the previous table for the recursive reference and storing the results in the current table (312). The process loops back to compare the current table to the previous table (306), first discarding rows (304) if the common table expression uses a UNION.
The usual declarative semantics of a recursive relation, whether defined in SQL or in a logic programming language, e.g., Datalog, is that the defined relation is the least fixed point of successive applications of the definition to the empty relation. This is illustrated in the following Table 1, which shows the results of applying the logic illustrated in FIG. 3 to the query shown above.
TABLE 1PreviousCurrentStagetabletableComments0empty(b, 0) (d, 0)Results of the nonrecursive term: all(e, 0)nodes with no children have a depthof 0.1(b, 0) (d, 0)(b, 0) (d, 0)Using the previous table to define(e, 0) (e, 0) (a, 1)the current table at this stage yields(c, 1)tuples for the parents a and c of b,d, and e.2(b, 0) (d, 0)(b, 0) (d, 0)a has no parent, but c does. Thus,(e, 0)(e, 0) (a, l)this stage yields a second tuple(a, l) (c, 1)(c, 1) (a, 2)for a.3(b, 0) (d, 0)(b, 0) (d, 0)The previous and current tables are(e, 0) (a, l)(e, 0) (a, 1)equal, so the process terminates.(c, 1) (a, 2)(c, 1) (a, 2)
The evaluation strategy iterates from the empty relation, applying the common table expression on the results table again and again until applying the expression does not change the table, that is, until a fixed point is reached. However, this procedure does not define the semantics of the recursive relation. An implementation, in SQL or otherwise, can perform the computation in any way that yields the least fixed point.
Any recursive query Q defines a non-recursive operator on relations, which takes a relation T and yields the relation Q(T) which results from substituting T into the definition of Q. The semantic value of Q is then the least fixed point of the non-recursive operator when applied to the empty relation. For such a recursive query to be well defined, it is essential that the operator be monotonic on its input relation. A query is monotonic by definition when S⊂T=Q(S)⊂Q(T), for every relation S. In the present context, if a query is monotonic, then when the query has executed using the previous table as the recursive reference, every row in the previous table must exist in the current relation. That is, a monotonic query only adds results to the previous table; it never removes them. As can be seen, this property holds in Table 1; the previous table is always completely contained in the current table.