Databases are compilations of information. Useful operations that can be performed on a database include the selection of specific data from the database, and the computation of summary information on all or part of the data in the database. One well
known way of specifying operations to be performed on a database is Structured Query Language (SQL). An SQL statement that performs an operation on a database is known as a query.
In large database applications, such as data warehouses and very large transaction recording systems, the size of the database and the volume of incoming data may be very large. Operations on very large databases are time consuming, due to the large amount of data to be processed, and expensive, due to the large quantity of system resources that must be utilized. One solution to this problem is to materialize and maintain appropriately defined aggregation views (summary tables), which are much smaller than the underlying database and can be cached in fast memory.
In globally distributed information systems, the relations may be distributed or replicated, and locating as well as accessing them may be expensive and sometimes not even possible. In mobile computing applications, the relations may be stored on a server and be accessible only via low bandwidth wireless communication, which may additionally become unavailable. Locally cached materialized views of the data, such as results of previous queries, may considerably improve the performance of such applications.
A need arises for an automated technique by which a time consuming and expensive query can be converted or rewritten to form an equivalent query that uses materialized views to compute the answers.
There has been previous work on using views to answer queries (e.g., H. Z. Yang and P. A. Larson, "Query transformation for PSJ queries," In Proc. VLDB, 1987; M. Stonebraker, A. Jhingran, J. Goh, and S. Potamianos, "On rules, procedures, caching and views in database systems", In Proc. ACM SIGMOD, 1990; O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis, "The GMAP: A versatile tool for physical data independence", In Proc. VLDB, 1994; C. M. Chen and N. Roussopoulos, "The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching", In Proc. EDBT, 1994; S. Chaudhuri, R. Krishnamurthy, S. Potarianos, and K. Shim, "Optimizing queries with materialized views", In. Proc. ICDE, 1995; A. Y. Levy, A.O. Mendelzon, Y. Sagiv, and D. Srivastava, "Answering queries using views", In Proc. ACM PODS, 1995!, but the problem of finding the equivalent rewritings for SQL queries with multiset semantics, grouping and aggregation, have received little attention.
Caching of previous query results was explored in T. Sellis, "Intelligent caching and indexing techniques for relational database systems", Information Systems, pp. 175-188, 1988, as a means of supporting stored procedures. This corresponds to using materialized views when they match syntactically a sub-expression of the query. In the ADMS optimizer, subquery expressions corresponding to nodes in the query execution (operator) tree were also cached. A cached result was matched against a new query by using common expression analysis. Grouping and aggregation issued were not addressed.
View usability has been studied for conjunctive queries with set semantics and without grouping and aggregation. Levy et al. showed a close condition between the problem of usability of a view in evaluating a query and the problem of query containment. However, this connection does not carry over to the multiset case. Yang et al also presented a simple technique for generating a rewriting of a query Q using view V, under set semantics. Essentially the technique consists of first conjoining V to the FROM clause of Q, and then (independently) minimizing the resulting query to eliminate redundant tables. In the case of SQL queries, however, because of the multiset semantics, the query will not be equivalent after conjoining V to the FROM clause, even if it may be equivalent after removing other tables. Therefore, we need to find a priori which tables in the FROM clause will be replaced by V.
Optimization of conjunctive SQL queries using conjunctive views has been studied in Chaudhuri et al. In addition to considering when such views are usable in evaluating a query, they suggest how to perform this optimization in a cost-based fashion. However, they did not consider the possibility of rewritings that are UNION ALLS of single-block queries.
Recently, the problem of using materialized aggregation views to answer aggregation queries using a purely transformational approach has been considered in A. Gupta, V. Harinarayan, and D. Quass, "Aggregate-query processing in data warehousing environments", Proc. VLDB, 1995. They perform syntactic transformations on the operator tree representation of the query such that the definition of the view would be identical to a sub part of the definition of the query. Additional information of queries involving aggregation have been proposed. The transformational approach is more restrictive than our semantic approach. For example, these techniques would neither determine the usability of view V.sub.1 in evaluating query Q.sub.1 in Example 1.1, nor the usability of view V.sub.1 ' in evaluating Q.sub.1 in the same example. Also, no formal guarantees of completeness are provided.
A related problem is studied in A. Gupta, I. S. Mumick, and K. A. Ross. "Adapting materialized views after redefinitions", In Proc. ACM SIGMOD, 1995. They assume that a materialized view may be redefined, and investigate how to adapt the materialization of the view to reflect the redefinition. This problem is clearly a special case of the one we study., with the additional assumptions that the system knows the type of modification that took place, that the new view definition is "close" to the old definition, and that the view materialization may be modified.