A continuing demand exists to couple multiple databases so as to enable transparent access to data stored therein. "Transparent" implies that an application program at a specific site is able to access data from all connected databases, without being aware of the origin of the data, nor of any incompatibilities which might exist between a local database and any of the plurality of coupled remote databases. It is to be understood that the terms "remote" and "local", as used herein, not only refer to physical locations, but also to databases that are located at a single site, but are controlled by different operating systems or database protocols.
In order to provide a transparent interface for heterogeneous databases, the prior art has employed one database as an interface and has enabled that interface, under control of a database management system (DBMS), to access data from other databases in accordance with data entries contained in an interface table.
Upon receiving a query from a query requester (e.g., an application program), the receiving DBMS performs a query optimization procedure to decide upon an efficient method for accessing the requested data. During such an optimization action, the specific data manipulation required to respond to the query is examined and various alternative methods for deriving a query response are evaluated. Various types of data manipulations are considered, with the one which is chosen being a perceived "best" in terms of processing efficiency and cost. Thereafter, a database query is prepared and dispatched to the remote database which contains a table or tables that contain the data which is necessary to respond to the application query. The data retrieved from the table(s), in answer to the database query, is termed the "result set". To prepare the result set, the required data manipulations may be performed at the remote database. Otherwise, the data manipulation(s) are performed by the local DBMS, based upon the received result set.
Relational DBMS's use SQL (structured query language) as a standard language for enabling database manipulations. The SQL language allows users to formulate relational operations on the database tables. For example, each SQL operator operates on either one or two tables and produces a new table as a result. SQL enables the linking together of information from multiple tables or views to perform complex sets of procedures.
Typically, a relational database stores a single data value entirely within a single storage unit whose size is prescribed by the database architecture. To provide greater limits on the data values stored, some relational database management systems include a specially defined datatype called a large object, generally referred to as a LOB. Recently, databases permit LOBs to have data sizes on the order of plural gigabytes (10.sup.6 KB). Because of their size, transfers of LOB data values can require potentially many disk drive access operations that can significantly impede data manipulations and incur a severe performance penalty. Such a penalty exists even if a relatively minor change is made to a LOB. Currently, LOBs are experienced when full color image data is included as part of a database table.
The prior art has handled and manipulated LOBs using a technique called "deferred evaluation" that links data structures together. In deferred evaluation, the evaluation of predetermined string operators in an assignment statement is deferred until the entire assignment statement is received, rather than the more typical action of immediately executing string operators as they are encountered. Typically, a data structure is created for each operand of an assignment statement and includes a specification of what string operations are to be performed. The DBMS analyzes the data structures and the string operations and delays actually retrieving any data values from the data base until string operations have been simplified. That is, intermediate results are not written back to the database disk if they can be used for a next string operation. In such manner, disk access operations are reduced.
In U.S. Pat. No. 5,566,323 to Gainer et al., assigned to the same Assignee as this application, a relational DBMS is described which stores and retrieves LOBs. The Gainer et al. disclosed DBMS "mutates" selected string operations on LOBs into equivalent functions requiring reduced storage access operations. When a LOB is encountered, the DBMS first checks to determine if the statement can be "mutated". A "mutateable" statement is an assignment statement having at least one string operation and operands, such that the string operator can be changed or mutated into an equivalent function on a LOB operand that can be left in the database, thereby eliminating associated storage access operations. That is, the mutated function does not require that the LOB be read from the database into intermediate storage, but instead provides an equivalent string result by operating on the LOB, in place, in the database storage. The mutated statement is then processed according to the above-described deferred evaluation scheme.
When LOBs are encountered in heterogeneous database systems, transfers of LOBs from remote database tables are often required to respond to a query from an application program. Local storage of such LOB data can impose a significant processing cost on the heterogenous database system and is to be minimized, if at all possible.
Accordingly, it is an object of this invention to provide a method and apparatus for improving the efficiency of handling of LOB transfers in a heterogenous database system.