The present invention relates generally to computerized database systems and more particularly to a method of evaluating a recursive query of a database.
Database systems are being used to store and manage more and more different kinds of data. As the use of database systems has expanded and the quantities of data stored in a database have increased, much effort has been devoted to improving existing database systems and developing new systems with new and better capabilities.
To "query" a database means to request information from it. The requested information may be obtained directly by retrieving data items stored in the database or may be derived from data items stored in the database.
A kind of database query which has grown more important in recent years is a query which derives information recursively. Such a query can be described as a query which queries itself repeatedly. Deriving information recursively requires evaluating a recursive relation among the data items in the database. A general discussion of the mathematical concept of recursion can be found in J. Bradley, Introduction to Discrete Mathematics, ch. 6, Addison-Wesley 1988; see also E. Roberts, Thinking Recursively, John Wiley 1986.
For example, assume that a certain database contains a set of data items each expressing a parent-child relation. A few of such data items might be: Bill is father of Richard; Abigail is mother of Richard; Richard is father of Mary; Jane is mother of Mary; John is father of Andrew; Mary is mother of Andrew; and so on. Also assume that the names of the parents of a person are returned by a query of the form "FIND PARENTS OF [X]" where X is the name of the person whose parents are to be found. The name of a parent of a person is an example of the kind of information which is obtainable directly by retrieving a data item in this particular database.
Continuing with the above example, it will be noted that the database contains no information about grandparents. However, it will be apparent that such information can be recursively derived from the information in the database, for example by a query of the form "FIND PARENTS OF [FIND PARENTS OF [X]]". Providing the information requested by such a query involves evaluating the recursive relation of grandparent and grandchild.
Relations expressed by data items in modern databases are often far more complex than the above examples. For instance, a relation between an airline flight and a pair of cities would be expected to also include departure and arrival times, meal service, number of seats available, continuing service to other cities, fares, and so forth. Generating a trip itinerary to meet the needs of a passenger might require evaluating a number of recursive relations.
If the number of iterations required to evaluate a recursive relation is known in advance, then evaluating the relation is relatively straight-forward. For example, the request to find the grandparents of X requires exactly two iterations--one to find the parents of X and one to find the parents of the parents of X. However, if the number of iterations is not known, then the evaluation becomes far more difficult; an example of a request in which the number of iterations is not known is a request to find all ancestors of X.
As the volume of data in a database grows larger and the nature of the relations expressed by the data grows more complex, the time required for even a very powerful computer to respond to a complex query can become unacceptably long, especially when the number of iterations required to derive the response is not known in advance. Accordingly, the efficient evaluation of recursive relations has become a matter of critical importance in the design of modern database systems. A comprehensive survey of this problem is presented by F. Bancilhon and R. Ramakrishnan in "An Amateur's Introduction to Recursive Query Processing Strategies" in the Proceedings of the ACM-SIGMOD Conference, Washington, D.C., May 1986.
A set of mathematical operators collectively referred to as "relational algebra" has been developed for the manipulation of data items in a database (see generally C. J. Date, An Introduction to Database Systems (4th Ed.) Vol. I, ch. 13, Addison-Wesley 1986). The relational algebra affords many benefits and is widely used in modern data system design.
Evaluating a recursive relation r requires finding a "least fixpoint" of a recursive equation of the form EQU r=f(r) (1)
where f is a function which will be defined in a succeeding paragraph. The least fixpoint of the recursive relation r is defined as a relation r* which satisfies the following criteria: EQU r*=f(r*) (2)
and EQU r* .OR right.p for any p satisfying the equation p=f(p). (3)
See A. Aho et al., "Universality of Data Retrieval Languages", Proceedings of the Sixth POPL, 1979.
The relational algebra does not support finding the least fixpoint of a recursive relation. Accordingly, new operators such as transitive closure operators have been proposed to give the relational algebra the ability to evaluate recursive relations (R. Agrawal, "Alpha: An Extension of Relational Algebra to Express a Class of Recursive Queries", Proceedings of the Third International Conference on Data Engineering, Los Angeles, Calif., Feb. 3-5, 1987; S. Ceri et al., "Translation and Optimization of Logic Queries: the Algebraic Approach", Proceedings of the Eleventh International Conference on Very Large Data Bases, Kyoto, Japan, August 1986).
Although not all recursive equations have least fixpoints, if the function f is monotone the equation is guaranteed to have a least fixpoint. A function f consisting only of relational algebra operations is monotone and therefore a recursive equation in which the function f consists only of relational algebra operations has a least fixpoint. See generally A. Tarski, "A Lattice-Theoretical Fixpoint Theorem and its Applications", Pacific Journal of Mathematics, vol. 5, no. 2, pages 285-309, June 1955.
To evaluate a pair of mutually recursive relations r.sub.1 and r.sub.2 requires finding the least fixpoints of the following recursive relational algebra equations: EQU r.sub.1 =f.sub.1 (r.sub.1, r.sub.2) (4)
and EQU r.sub.2 =f.sub.2 (r.sub.1, r.sub.2) (5)
More generally, evaluating a set of n mutually recursive relations r.sub.1 through r.sub.n requires finding the least fixpoints of a set of recursive equations of the form EQU r.sub.i =f.sub.i (r.sub.1, . . . , r.sub.n) (6)
The functions f.sub.i are defined in terms of the definitions of the recursive relations themselves. For example, two mutually recursive relations r.sub.1 and r.sub.2 are defined by the following Horn clauses: EQU r.sub.1 (X,Y).rarw.b.sub.1 (X,Z), r.sub.2 (Z,Y) (7) EQU r.sub.2 (X,Y).rarw.r.sub.1 (X,Z), b.sub.2 (Z,Y) (8) EQU r.sub.1 (X,Y).rarw.b.sub.3 (X,Y) (9)
where the b's are base relations and X, Y and Z are columns in the base relations. The functions f.sub.1 and f.sub.2 are then given by: EQU f.sub.1 (r.sub.1,r.sub.2)=b.sub.3 .orgate. b.sub.1 .smallcircle. r.sub.2( 10) EQU f.sub.2 (r.sub.1,r.sub.2)=r.sub.1 .smallcircle. b.sub.2 ( 11)
where .smallcircle. is a "composition operator" which is a join followed by a projection on the attributes in the target list.
More generally, a set of n functions f.sub.i are defined in terms of n mutually recursive relations r.sub.i as follows: EQU r.sub.i =b.sub.i.sup.o .orgate. b.sub.i .smallcircle. r.sub.1 .smallcircle. r.sub.2 .smallcircle. . . . .smallcircle. r.sub.n ( 12)
where some of the b.sup.0 may be empty but at least one must not be empty to guarantee that the computation will terminate. Some of the b.sub.i may be intermediate relations obtained from one or more other b.sub.i.
There are various methods of computing the least fixpoint of a recursive equation in a database environment. One of these, the semi-naive method, provides good performance when used with appropriate optimizing techniques. F. Bancilhon et al., cited above; F. Bancilhon, "Naive Evaluation of Recursively Defined Relations" in Brodie and Mylopoulos (eds.), On Knowledge Based Management Systems--Integrating Database and AI Systems, pub. Springer-Verlag, 1985.
The semi-naive method of computing the least fixpoint is an iterative method. For a recursive relation r and a corresponding recursive equation r=f(r), the semi-naive method iteratively computes new values of r satisfying the equation and inserts them into r during successive iterations until no new values of r can be found, as follows: ##EQU1## where .delta.r is a new value of r, .phi. is the empty set, and j is the variable of iteration.
To find the least fixpoints of a set of n mutually recursive relations r.sub.i, a set of mutually recursive equations of the form of equation 6 must be evaluated together, as follows: ##EQU2##
A differential approach to finding the least fixpoints of a set of n mutually recursive relations has also been proposed (I. Balbin et al., "A Differential Approach to Query Optimization in a Recursive Deductive Database", Technical Report 86/7, Department of Computer Science, University of Melbourne, Australia). This approach is described as follows: ##EQU3##
The above methods of finding a least fixpoint have been implemented in database environments by means of separate application programs.
From the foregoing, it will be appreciated that there is a need for a data structure and method for arranging recursively derived data items in a database in a way which provides efficient evaluation of recursive relations, especially if the relations are mutually recursive relations and if the number of iterations is not known in advance, by such means as the direct computation of least fixpoints without any need of separate application programs.