A continuing demand exists to couple multiple databases so as to enable transparent access to data stored therein. "Transparent" implies that an application program at a specific site is able to access data from all connected databases, without being aware of the origin of the data, nor of any incompatibilities which might exist between a local database and any of the plurality of coupled remote databases. It is to be understood that the terms "remote" and "local", as used herein, not only refer to physical locations, but also to databases that are located at a single site (e.g. on one or more computers), but are controlled by different operating systems or database protocols.
In order to provide a transparent interface for heterogeneous databases, the prior art has employed one database as an interface and has enabled that interface, under control of a database management system (DBMS), to access data from other databases in accordance with data entries contained in an interface table.
Upon receiving a query, the receiving DBMS performs a query optimization procedure to decide upon an efficient method for accessing the requested data. During such an optimization action, the specific data manipulation required to respond to the query is examined and various alternatives methods for deriving a query response are evaluated. During such an optimization action, various types of join methods may be considered. A join method is used when rows from an "outer" table are concatenated to rows of one or more other tables (i.e., "inner" tables), in accord with a determined criteria. A "table" as used herein is any tabular presentation of data. An outer table is one from which a search name is retrieved from a "joining column". An inner table is one from which data is retrieved, based on the search name retrieved from the joining column. The joining column is the column in the outer table which includes the data or search names that are utilized when accessing data in the inner table. The data retrieved from the inner table and/or the outer table, in answer to a received query, is termed the "result set".
Relational DBMS's use SQL (structured query language) as a standard language for enabling database manipulations. The SQL language allows users to formulate relational operations on the database tables. For example, each SQL operator operates on either one or two tables and produces a new table as a result. SQL enables the linking together of information from multiple tables or views to perform complex sets of procedures. One of those procedures is a join of columns of data from two or more tables.
Typically, a relational database stores a single data value entirely within a single storage unit whose size is prescribed by the database architecture. To provide greater limits on the data values stored, some relational database management systems include a specially defined datatype called a large object, generally referred to as a LOB. Recently, databases permit LOBs to have data sizes on the order of plural gigabytes (10.sup.6 KB). Because of their size, transfers of LOB data values can require potentially many disk drive access operations that can significantly impede data manipulations and incur a severe performance penalty. Such a penalty exists even if a relatively minor use is made of a LOB. Currently, LOBs are experienced when full color image data is included as part of a database table.
The prior art has handled and manipulated LOBs using a technique called "deferred evaluation" that links data structures together. In deferred evaluation, the evaluation of predetermined string operators in an assignment statement is deferred until the entire assignment statement is received, rather than the more typical action of immediately executing string operators as they are encountered. Typically, a data structure is created for each operand of an assignment statement and includes a specification of what string operations are to be performed. The database management system analyzes the data structures and the string operations and delays actually retrieving any data values from the data base until string operations have been simplified. That is, intermediate results are not written back to the database disk if they can be used for a next string operation. In such manner, disk access operations are reduced.
In U.S. Pat. No. 5,566,323 to Gainer et al., assigned to the same Assignee as this application, a relational DBMS is described which stores and retrieves LOBs. The Gainer et al. disclosed DBMS "mutates" selected string operations on LOBs into equivalent functions requiring reduced storage access operations. When a LOB is encountered, the DBMS first checks to determine if the statement can be "mutated". A "mutateable" statement is an assignment statement having at least one string operation and operands, such that the string operator can be changed or mutated into an equivalent function on a LOB operand that can be left in the database, thereby eliminating associated storage access operations. That is, the mutated function does not require that the LOB be read from the database into intermediate storage, but instead provides an equivalent string result by operating on the LOB, in place, in the database storage. The mutated statement is then processed according to the above-described deferred evaluation scheme.
When LOBs are encountered in heterogeneous database systems, transfers of LOBs from remote database tables are often required to respond to a query from an application program. Such LOB transfers can impose a significant communication cost on the heterogenous database system and are to be minimized, if at all possible.
Accordingly, it is an object of this invention to provide a method and apparatus for improving the efficiency of handling of LOB transfers in a heterogenous database system.
It is another object of this invention to provide a system and method for deferring LOB transfers when preparing a response to a query from an application program.