1.1 Field of the Invention
The invention concerns techniques for constraint propagation generally and more particularly concerns techniques for optimizing queries by moving constraints so that they are applied as early as possible in a computation.
1.2 Description of the Prior Art
Data base systems have long been used to store and access large quantities of data. The data in the data base is organized in some fashion and queries made by users of the data base system take advantage of the organization of the data base to return portions of the data which are useful to the user. Many modern data base systems are relational data base systems. In such systems, the data is organized into a set of relations. Each relation is a table which has columns and rows. The columns define classes of data (for example, employee name, employee department, and pay), while the rows contain related values of the data. Thus, a us.sub.-- employee relation might have a row for each employee in the U.S. and columns for employee name, employee department, and employee pay. The specific values in a row for employee department and employee pay are the ones for the employee specified by the employee name. There are two kinds of relations in the data base system: base relations in which the data values are actually stored in the data base system, and views, that is, relations which are not stored, but which are constructed on the fly from data in the base relations.
The queries in relational data base systems define the data they wish to return in terms of relations and constraints on the relations which limit what data the query returns from the relation. For example, a query on the us.sub.-- employee relation might request a list of the names of all U.S. employees whose pay was greater than $100,000 per year. The requirement that the employees on the list be paid more than $100,000 is of course a constraint.
Most relational data base systems use a language called SQL for their queries. A query in SQL for the above example would look like this:
SELECT name, pay PA0 FROM us.sub.-- employees PA0 WHERE pay&gt;100,000. PA0 SELECT name, pay PA0 FROM us.sub.-- employees, world.sub.-- employees PA0 WHERE pay&gt;100,000
In such a query, the field names of a relation are termed attributes of the relation. Thus, in the above example, the relation us.sub.-- employees has the attributes name, pay, and dept. A constraint which applies to an attribute, for example, the constraint that us.sub.-- employees.pay&gt;100,000 is termed a predicate which is applicable to the attribute, in this case, the attribute pay.
Since many data bases are very large, efficiency is an important aspect of the design of data base systems. One aspect of efficiency is query optimization, which puts the computations needed to execute a query into a form which minimizes the amount of time and/or memory resources required. An important part of query optimization is optimizing the application of constraints. Two techniques for optimizing constraints are minimizing the number of constraints and applying them as early as possible.
For instance, if we expand the previous example to include another relation, world.sub.-- employees, in which the rows have the same attributes as the rows in us.sub.-- employees and apply our query to both relations, we get:
There are two ways of computing this query: one way is to assemble all of the rows from both relations and then apply, the predicate pay&gt;100,000; the other is to apply the predicate in each relation and only assemble the rows from each relation which satisfy the predicate. Clearly, the second way saves the time required to copy the rows which do not satisfy the predicate and the space required to store all of the rows from both relations, and is consequently more efficient. The second way is thus an optimization. In this case, the optimization is achieved by applying the predicate in each relation, so that it is applied as early as possible in the computation.
A number of techniques are known for applying predicates as early as possible. These techniques are commonly termed predicate pushdown techniques. The techniques are applied to a query graph, a data structure which represents the query during optimization. In the query graph for a query, each of the query's subqueries and views forms a block of the graph: One of the techniques is merging query blocks, as explained in H. Pirahesh, J. Hellerstein, and W. Hasan, "Extensible/rule based query rewrite optimization in starburst", in SIGMOD 1992, pp. 39-48. This technique is limited by the fact that query blocks often cannot be merged. Another of the techniques is pushing predicates down the query graph, into query blocks that are computed earlier during evaluation. This technique works only on hierarchical queries, that is, queries which are nonrecursive and do not have common subexpressions. See J. Ullman, Principles of Database and Knowledgebase Systems, vol. 1 and 2, Computer Science Press, 1989. A third technique is based on the magic set transformation, explained in the Ullman text. The third technique permits a predicate to be moved up from a relation and down into another relation when the other relation appears later in the join order, as explained in detail in I. Mumick, S. Finkelstein, H. Pirahesh, and R. Ramakrishnan, "Magic Conditions", in PODS 1990.
What is needed, and what is provided by the present invention, is an optimization technique which permits predicates generally to be moved around a query tree to the node where their application is most efficient and applying them in that node.