1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to query optimization with deferred updates and autonomous sources.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
The integration of object technology and database systems has been an active area of research for the past decade. One important aspect of the integration of these two technologies is the provision of efficient, declarative query interfaces for accessing and manipulating object data. Compared to other aspects of object-oriented database (xe2x80x9cOODBxe2x80x9d) technology, such as integrating persistence into object-oriented languages (e.g., C++ and Smalltalk), queries were given relatively little attention in the early days of OODB research, which is further described in Mike Stonebraker, Third-Generation Data Base System Manifesto, Computer Standards and Interfaces, Dec. 12, 1991, which is incorporated by reference herein. In Won Kim, Object-Oriented Database Systems: Promise, Reality, and Future, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993, which is incorporated by reference herein, it is pointed out that commercial OODB systems are weak in this regard.
However, a number of proposals for OODB query languages have appeared in the database literature. See Mike Carey, David DeWitt, and Scott Vandenberg, A Data Model and Query Language for EXODUS, Proc. ACM-SIGMOD International Conference on Management of Data, Chicago, June 1988; Won Kim, A Model of Queries For Object-Oriented Databases, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989; Francois Bancilhon, S. Cluet, and C. Delobel A, Query Language for the O2 Object-Oriented Database System, edited by Richard Hull, Ron Morrison, and David Stemple, Proc. 2nd International Workshop on Database Programming Languages, Gleneden Beach, Morgan-Kaufmann Publishers, Inc., June 1989 [hereinafter xe2x80x9cBancilhon et al.xe2x80x9d]; Jack Orenstein, Sam Haradhvala, Benson Margulies, and Don Sakahara, Query Processing in the ObjectStore Database System, Proc. ACM-SIGMOD International Conference on Management of Data, San Diego, June 1992, [hereinafter xe2x80x9cOrenstein et al.xe2x80x9d]; S. Dar, N. Gehani, and H. Jagadish, CQL++: A SQL for a C++ Based Object-Oriented DBMS, Proc. International Conference on Extending Data Base Technology, Advances in Database Technologyxe2x80x94EDBT ""92, Lecture Notes in Computer Science, Vienna, Springer-Verlag, 1992; Michael Kifer, Won Kim, and Yehoshua Sagiv, Querying Object-Oriented Databases, Proc. ACM-SIGMOD International Conference on Management of Data, San Diego, June 1992; Tom Atwood, Joshua Duhl, Guy Ferran, Mary Loomis, and Drew Wade, Object Query Language, edited by R. G. G. Cattell, Object Database Standards: ODMG-93 Release 1.1 Morgan-Kaufmann Publishers, Inc., 1993, [hereinafter xe2x80x9cAtwood et al.xe2x80x9d]; Josxc3xa9 Blakeley, William J. McKenna, and Goetz Graefe, Experiences Building The Open OODB Query Optimizer, Proc. ACM-SIGMOD International Conference on Management of Data, Washington D.C., May 1993; each of which is incorporated by reference herein. While proposals outnumber actual implementations, several of these language designs have indeed been implemented as the query interfaces for significant commercial OODB products, which is discussed in Bancilhon et al. and Orenstein et al.
The commercial OODB systems that are generally considered to have good object query facilities, O2, which is discussed in Bancilhon et al., and ObjectStore, which is discussed in Orenstein et al., each provide their own flavor of object query language. ObjectStore""s query language is an extension to the expression syntax of C++. O2""s query language is generally more like SQL and has been adapted into a proposed OODB query language standard (i.e., the ODMG-93 proposal) by a consortium of OODB system vendors, which is discussed in Atwood et al., but it differs from SQL in a number of respects, which is discussed further in Won Kim, Observations on the ODMG-93 Proposal, ACM SIGMOD Record, 213(1), March 1994, which is incorporated by reference herein.
Query rewrite transformations have been developed for relational DBMSs. See Hamid Pirahesh, Joseph M. Hellerstein, and Waqar Hasan, Extensible/Rule Based Query Rewrite Optimization in Starburst, Proc. ACM-SIGMOD International Conference on Management of Data, San Diego, June 1992; Inderpal Singh Mumich, Sheldon J. Finkelstein, Hamid Pirahesh, and Raghu Rarnakrishnan, Magic is Relevant, Proc. ACM-SIGMOD International Conference on Management of Data, pages 247-258, Atlantic City, May 1990; Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan, The Magic of Duplicates and Aggregates, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990; each of which is incorporated by reference herein.
Many of these transformations also apply for Object Query Systems. However, new query rewrite transformations that apply specifically to Object Query Systems need to be developed, as discussed in Sophie Cluet and Claude Delobel, A General Framework for the Optimization of Object-Oriented Queries, Proc. ACM-SIGMOD International Conference on Management of Data, San Diego, June 1992, which is incorporated by reference herein. Predicate pushdown, which is a query rewrite transformation, is the notion of taking a query and determining which parts of the query can be migrated through the layers of the schema to the databases where the data resides. The objective is to use the power of the database query function to do data filtering, and, thereby, restrict the amounts of data that have to be transferred from the database servers to clients.
Predicate pushdown can include all of the predicates that define a query""s result, in which case the task of restricting the result set is entirely performed by the databases where the data resides. Predicate pushdown can include partial predicates that define a query""s results, in which case some of the predicates (e.g., a subset of the conjuncts that define a query""s result) are passed down to the databases where the data resides, thereby restricting the results returned by these databases. The remaining predicates that could not be pushed down are then applied in object space by the query evaluator. Finally, if predicate pushdown cannot be applied, the predicates that define a query""s results must be applied in object space after having retrieved the complete sets of data referenced in the query.
Semi-join techniques for distributed query processing have been presented in Clement T. Yu and C. C. Chang, Distributed Query Processing, ACM Computing Surveys, 16(4):399-433, December 1984, [hereinafter xe2x80x9cYu et al.xe2x80x9d], which is incorporated by reference herein. Similarly, to join queries, semi-join queries involve multiple tables and have predicates that establish a relationship among the tables participating in a query. However, only a subset of the tables are referenced in the projection clause of a semi-join query.
The following optimization is presented in Yu et al. Given two remote tables T1 and T2 participating in a join, if each table is managed by a different remote DBMS, results corresponding to table Ti are first retrieved. Results corresponding to a table Tj, jxe2x89xa0i are then retrieved by selecting those results where the join column values of Tj are in the list of join column values of results retrieved from table Ti. This procedure is especially interesting if there is one or more additional predicates only involving Ti that can be used to initially restrict the first result.
Query evaluation using a client cache is presented in Shaul Dar, Michael J. Franklin, Bjxc3x6rn T. Jxc3x3nsson, Divesh Srivastava, and Michael Tan, Semantic Data Caching and Replacement, Proc. 22nd International Conference on Very Large Data Bases, Mumbai, August 1996, [hereinafter xe2x80x9cDar et al.xe2x80x9d], which is incorporated by reference herein. Dar et al. focuses on determining whether a query can be resolved from the client cache alone or whether a partial query result can be obtained from the client cache with the remaining result drawn from the database server. The technique that is used for query evaluation is predicated on maintaining a semantic description of the client cache content. For a given table, the semantic description is a constraint that is dynamically modified to include new cache entries. For example, if a query initially retrieves all employees having a salary between 50,000 and 100,000, the constraint describing the cache content for the employee table is salxe2x89xa750000 and salxe2x89xa6100000. If a subsequent query requests employees having a salary between 60,000 and 80,000, that query result can be drawn from the cache alone. A similar approach called predicate-based caching is presented in Arthur M. Keller and Julie Basu, A Predicate-Based Caching Scheme for Client-Server Database Architectures, The VLDB Journal, 5:35-47, 1996, which is incorporated by reference herein.
The performance tradeoffs of query shipping versus data shipping are presented in Michael J. Franklin, Bjxc3x6rn Thxc3x3r Jxc3x3nsson, and Donald Kossmann, Performance Tradeoffs for Client-Server Query Processing, Proc. ACM-SIGMOD International Conference on Management of Data, Montreal, June 1996, which is incorporated by reference herein. Query shipping is described as the relational DBMS client-server environment in which a query is passed from the client to the server, and the work of query evaluation is done on the server. The object-oriented DBMS client-server environment is coined data shipping because data pages or objects are transferred from the server to the client where most of the processing takes place (e.g., navigational data access). Hybrid shipping is proposed to combine the advantages of both query shipping and data shipping.
Therefore, there is a need in the art to extend the previous systems in a middleware system with an object cache connected to a database. Furthermore, there is a need in the art for an improved technique for query optimization with deferred updates and autonomous sources.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer-implemented technique for query optimization with deferred updates and autonomous sources.
In accordance with the present invention, an object-oriented query is executed to retrieve data from a database. The database is stored on a data storage device connected to a computer. The object-oriented query is transformed into subqueries, wherein at least one subquery is directed against a database, and wherein at least one subquery is directed against an object cache. Each subquery that is directed against a database is executed to retrieve data from the database into the object cache. Each subquery that is directed against the object cache is executed to retrieve data for the query, wherein the data incorporates updates to the object cache and updates to the database.
An object of the invention is the provision of object query access for object views of heterogeneous databases that can include relational databases. Another object of the invention is to retrieve data in response to an object-oriented query, while taking into account differences between data in an object cache and in a database.