Database systems store, retrieve, and process information. In order to retrieve information from the database, a user provides a query (written in a query language such as SQL), where the query specifies the information to be retrieved and the manner in which it is to be manipulated or evaluated in order to provide a useful result. To process the query, the database system may convert the query into a relational expression that describes algebraically the result specified by the query. The relational expression is then used to produce an execution plan, which describes particular steps to be taken by a computer in order to produce the sought result.
When a relational expression is produced from a query, it may be the case that certain terms in the expression are redundant. For example, the operands of a join operator may be two instances of the same table, T. If T is a table that is stored in the database, then a straightforward evaluation of the join requires redundant accesses to the same (possibly large) table T during execution. Worse yet, T may not be a stored table, but rather may a table that is computed from a complex relational sub-expression. In this case, straightforward evaluation of the expression may require T to be derived twice from the same (possibly complicated) expression at runtime.
A conventional way to address this problem is to identify and evaluate common sub-expressions, spool (i.e., “buffer”) the entirety of the sub-expression result, and use the spooled result whenever the common sub-expressions are used. However, the sub-expression result may be relatively large, in which case some of the benefits of spooling will be lost. If a sub-expression result is larger than the available memory, then spooling the result may cause it to be paged to disk, which may be just as costly of resources as computing the result of the sub-expression twice.
In view of the foregoing, there is a need for a query evaluation system that overcomes the drawbacks of the prior art.