A continuing demand exists to couple multiple databases so as to enable transparent access to data stored therein. "Transparent" implies that an application program at a specific site is able to access data from all connected databases, without being aware of the origin of the data, nor of any incompatibilities which might exist between a local database and any of the plurality of coupled remote databases. It is to be understood that the terms "remote" and "local", as used herein, not only refer to physical locations, but also to databases that are located at a single site (e.g., on one or more computers), but are controlled by different operating systems or database protocols.
In order to provide a transparent interface for heterogeneous databases, the prior art has employed one database as an interface and has enabled that interface, under control of a database management system (DBMS), to access data from other databases in accordance with data entries contained in an interface table.
Upon receiving a query, the receiving DBMS performs a query optimization procedure to decide upon an efficient method for accessing the requested data. During such a query optimization action, various types of join methods are often considered. A join method is used when rows from an "outer" table are concatenated to rows of one or more other tables (i.e., "inner" tables), in accord with a determined criteria. As used herein, the term table includes any tabular data listing. An outer table is one from which a search name is retrieved from a "joining column". An inner table is one from which data is retrieved, based on the search name retrieved from the joining column.
The joining column is the column in the outer table which includes the data or search names that are utilized when accessing data in the inner table. The data retrieved from the inner table, in answer to a received query, is termed the "result set".
Relational DBMS's use SQL (structured query language) as a standard language for enabling database manipulations. The SQL language allows users to formulate relational operations on the database tables. For example, each SQL operator operates on either one or more tables and produces a new table as a result. SQL enables the linking together of information from multiple tables or views to perform complex sets of procedures, through a single statement. One of those procedures is a join of columns of data from two or more tables.
Performing of join operations can be quite costly in terms of performance because each row in a first table must be joined with multiple rows in a second table. In a heterogenous database system, a complex join operation can result in heavy communication costs, especially where one of the databases is remotely located.
DBMS query optimization procedures have, in the prior art, attempted various techniques for improving processing efficiency when responding to queries. U.S. Pat. No. 5,548,758 to Pirahesh et al., assigned to the same Assignee as this application, optimize a certain SQL query type in a relational DBMS through the use of early-out join transformations. An early-out join comprises a many-to-one join, wherein the join action scans an inner table for a match for each row of an outer table and terminates the scan for each row of the outer table when a single match is found in the inner table. Procedures are described for the transformation of a many-to-many join to an early-out join in the '758 patent.
The prior art also includes a number of teachings regarding optimization of query plans in heterogenous database systems. For instance, Hsu et al. in "Reformulating Query Plans for Multi-Database Systems", Second International Conference on Information and Knowledge Management, Nov. 1-5, 1993, pages 423-432 describe a query optimization technique that is particularly directed to heterogenous database systems. A query plan is reformulated, using database abstractions and knowledge about the contents of the databases, to arrive at a query plan that is less expensive but semantically equivalent. The object of the reformulation algorithm presented by Hsu et al. is to reduce the cost of data retrieval in response to a received query. Hsu et al. do not generate queries each time a controlling rule is executed. Instead, all applicable rules are executed at once and candidate constraints are collected into a list set. Thereafter, only appropriate constraints are selected that will contribute to the cost reduction. More precisely, the Hsu et al. approach is a "delayed-commitment strategy" because the system delays the reformulation of the query until it has enough information to make a decision.
A further suggestion regarding query optimization is presented by Arens et al. in "Intelligent Caching: Selecting, Representing and Reusing Data in an Information Server", Third International Conference on Information and Knowledge Management, Nov. 29-Dec. 2, 1994, pages 433-438. Arens et al. set out certain rules which are to be followed when determining what data to cache in a multiple heterogeneous and distributed database network. Arens et al. suggest that data retrieved in response to a user's query may be cached for future reuse and various broadly stated rationales are provided to assist in determining when the caching should occur.
It is an object of this invention to provide a method and apparatus for improving the efficiency of handling of queries to remote database tables through caching.
It is another object of this invention to provide a system and method for improving the efficiency of remote query operations in a heterogenous system.