The database management system has become an important category of computer software based system. Database management systems are available for all kinds of computers and computer systems, from single-user personal computers and workstations to multi-user, distributed mainframe systems. Among the purposes and benefits of a database management system is that it gives users access to data in a data-independent way, facilitates the collection and organization of data in databases, and provides tools allowing a user readily to create, update, modify and delete such databases. In a database management system, the databases themselves are collections of physical data that are stored with relative permanence in a machine-readable way.
Perhaps the most important class of database management system, both theoretically and commercially, is the relational database management system. In a relational database management system, the user sees the data as a collection of tables, each table being seen as a set of rows organized into columns. Each row may be thought of as a record, and each column, as a field in a record. (The term "field" may be used to refer either to a column as a whole, or to a particular row-and-column position.)
An important characteristic of a relational table is whether it is "normalized". In a properly normalized table, that is, in a table that is in at least first normal form, each row has unique identifying values in one or more fields, which fields are known as the primary key. Commercial products, however, will not necessarily enforce this limitation and may allow tables to be created that have duplicate rows. A more important characteristic of normalized tables is that they have an atomic value or a NULL (indicating "no value") in every field of the table. That is, normalized tables do not have other tables as field values. A table that does have (or is permitted to have) tables as field values is said to be in non-first normal form ("NFNF" or "NF.sup.2 "). This extension to the relational model will be considered in more detail below.
Although commercial products will deviate from the strict relational model, many commercial products are available from a variety of vendors that we may consider to be relational database management systems, including DB2 from International Business Machines Corporation of Armonk, N.Y. (DB2, DB2I and IBM are registered trademarks of International Business Machines Corporation ("IBM")); Ingres from ASK Computer Systems, Inc. of Mountain View, Calif. (Ingres is a registered trademark of ASK Corporation); and Oracle from Oracle Corporation of Redwood City, Calif. (Oracle is a registered trademark of Oracle Corporation). Most relational database management systems support the relational language SQL ("Structured Query Language", pronounced sequel). SQL is both a query language and a database programming language. In many systems, SQL may be accessed through an interactive interface (DB2I in the case of DB2) and an application programming interface. In addition, SQL may often be used as an embedded language ("embedded or dynamic SQL") in a general purpose programming language, such as PL/I in the case of DB2.
Not all database management systems are based on a relational model. An important alternative model is implemented in hierarchical database management systems. In such a system, the user may see the data organized into trees, each of which may have its own structure depending on the nature of the data it contains. A tree will have a single "root" record type, together with a set (possibly empty) of subtrees. Each subtree similarly will be seen as having a root and its own set of subtrees, and so on. Examples of commercial hierarchical database management systems include the original IMS from IBM (IMS is a registered trademark of IBM); and the MACCS.TM. and REACCS.TM. databases and systems from Molecular Design Limited of San Leandro, Calif. (MACCS.TM. and REACCS.TM. are trademarks of Molecular Design Limited.)
One non-first normal form extension to the relational model in which row-and-column values may be tables (including NFNF tables) we will call a "hierarchic relational database". The corresponding local external schema we will call a "hierarchic relational view". In a hierarchic relational database and view, a relation can be a tree that contains several hierarchic levels that define a one-to-many dependency between data. At any level, a parent table can include values that represent a set of records in a child table, which must be at an immediately lower level. Each table at a subordinate level can have many records for each record in the parent table. Records in the child table are joined to records in the parent table. This may occur either (i) when the values in certain fields of the parent and child tables (called "linking fields") are equal, or (ii) through the use of internal pointers, which are commonly used in non-relational database management systems. In the hierarchic relation, the parent-child relationship thus creates a link similar to an SQL join between master and detail tables. More correctly, the link in the hierarchic relation is similar to an outer join, in that it includes even master records that have no detail records.
Because it has minimal constraints and can be used in a natural way to express interrelationships among sets of databases that include hierarchical, relational and other kinds of databases, the hierarchic relational view may be used to define the structure of a multi-database in multi-database system. (A multi-database system is one that integrates the operation of multiple and possibly disparate databases managed by their native database management systems.) One multi-database system that uses a hierarchic relational view as a local external schema is the ISIS/Host multi-database system available from Molecular Design Limited of San Leandro, Calif. (ISIS.TM. is a trademark of Molecular Design Limited.) In ISIS/Host, the integrated database is defined by creating what is called in the system an "Hview" (a hierarchical view). An Hview is defined by an Hview definition file, which is created and maintained by a database administrator. In a simple system, the Hview definition file will point to one or more databases (relational or otherwise) on one host computer. In a network environment, the Hview definition file may point to network connections and databases located on various network nodes. Using Hviews, the database administrator can hide the network and database complexity from the data user and ISIS/Host can present the integrated data as if it came from a single database.
In a multi-database system, such as one just described, and in particular where databases are distributed across a computer network, the multi-database system must have methods for search and retrieval operations that reference fields in multiple databases in such a way that good system performance and user responsiveness are achieved. Obtaining good performance for such operations is particularly challenging when a query requires that searches in one database be performed in terms of intermediate results obtained from searches on different databases, which perhaps are resident on different computers.