A continuing demand exists to join multiple databases so as to enable transparent access to data stored therein. "Transparent" implies that an application program at a specific site is able to access data from all connected databases, without being aware of the origin of the data, nor of any incompatibilities which might exist between a local database and any of the plurality of coupled remote databases. It is to be understood that the terms "remote" and "local" not only refer to physical locations, but also to databases that are located at a single site, but are controlled by different operating systems or database protocols.
In order to provide a transparent interface for heterogeneous databases, the prior art has employed one database as an interface and has enabled that interface, under control of a database management system (DBMS), to access data from other databases in accordance with data entries contained in an interface table. Upon receiving a query, the receiving DBMS performs various query optimization procedures to provide an efficient method for accessing the requested data. However, in a heterogeneous database system, query planning, optimization and processing capabilities of the various database systems differ greatly. Even in the case where plural database systems are capable of executing an identical query statement (such as might be put forth using SQL, a commonly utilized database query language), the query plans produced in response to the query statement at each database may be vastly different. This can occur because each database system utilizes different access methods, join methods and/or aggregate functions in the performance of their database actions. Thus, if a query optimizer in a heterogeneous database system produces a query plan which assumes that all database systems produce similar query plans, a significantly sub-optimal query plan will be produced.
The prior art has suggested a number of methods of optimizing queries across heterogeneous databases. Haas et al. in "Optimizing Queries Across Diverse Datasources", international Conference on Veryn Large Databases, February, 1997, focus on integrating plural database systems. Haas et al. employ a query optimizer which uses information about the processing power of the remote database systems to optimize queries. Their query optimizer does not take into account the planning abilities of the remote database systems to produce a global plan. Neither do they consider search algorithm differences nor the access method to be used by the remote optimization systems.
Shu et al. in "Reformulating Query Plans for Multidatabase Systems", Proceedings of the Second International Conference on Information and Knowledge Management, Nov. 1-5, 1993, pages 423-432 describe a process for reformulating query plans to improve the efficiency of multidatabase queries. The Shu et al. approach uses database abstractions and knowledge concerning the contents of the heterogeneous databases to formulate a query plan. The Shu et al. system does not take into account optimization capabilities of the remote databases.
Kosar et al in "Multiplea Query Optimization with Depth-first Branch-and-Bound and Dynamic Query Ordering," Proceedings of the Second International Conference on Information and Knowledge Management, Nov. 1-5, 1993, pages 433-438, describe a query optimization procedure wherein groups of related queries are executed together in a single multi-plan instead of being executed separately. The Kosar et al procedure employs dynamic query ordering heuristics and other protocols to provide query optimization.
U.S. Pat. No. 5,600,831 to Levy et al describes techniques for optimizing queries in a heterogeneous database system. A query results in a query plan which includes subplans for querying the databases which contain the required information. When a subplan is executed in one of the databases, the database returns not only the information which results from the execution of the subplan, but also source and constraint information about the data in the database. The source and constraint information is then used to optimize the query plan by pruning redundant subplans.
U.S. Pat. No. 5,301,317 to Lohman et al. adapts a query optimization effort to expected execution time. Lohman et al. include a mechanism for automatically trading off the time spent estimating execution cost of alternate query execution plans against the potential savings in execution time that one of the alternative plans may yield.
Lin et al. in U.S. Pat. No. 5,590,321 describe a query optimization plan for a heterogeneous database system which uses an interface module that has information concerning the data stored in, and the capabilities of each of a plurality of databases in the heterogeneous system. The interface module determines whether a query or subquery satisfies several criteria, i.e., whether a single database within the heterogeneous system contains all of the data referenced in the query or subquery and whether the same database provides all of the functions or capabilities needed to satisfy the query or subquery. If these criteria are met, the query or subquery can be pushed down to the single database to there be executed.
A number of prior art references consider various methods for query optimization in relational database systems. Such references include Selinger et al. "Access Path Selection in a Relational Database Management System", Chapter 2. Relational Implementation Techniques, Readings on Database Systems, 2nd Ed., M. Stonebraker, Editor, Morgan-Kaufmann (1994); U.S. Pat. No. 5,598,559 to Chaudhuri; U.S. Pat. No. 5,544,355 to Chaudhuri et al.; and U.S. Pat. No. 5,546,576 to Cochrane et al. None of the aforementioned references consider query optimization in heterogeneous database systems nor, more particularly, the use of optimization processes carried out by remote databases in heterogenous systems.
Accordingly, it is an object of this invention to provide an improved method and apparatus for query optimization in a heterogeneous database system, wherein query planning functions of remoter databases are taken into account.
It is another object of this invention to provide an improved query optimization procedure for a heterogeneous database system which takes into account functions supported by remote databases in order to select an optimum query plan.
It is a further object of this invention to provide an improved method and apparatus for query optimization in a heterogeneous database system wherein one database is used as a transparent interface to an application program