This invention relates to information storage and management systems, and in particular to a multi-database information management system permitting the storage and retrieval of records located in a different information management system.
In a conventional information (or database) management system (IMS or DBMS), information is stored only internally to the system, so the system has access only to its own data. Such an IMS provides all of the mechanisms necessary to store, locate and retrieve the information in the databases under its control. For instance, in order to retrieve a record stored in one of the databases of the IMS, a user or a program will generate a query, asking for information which matches certain criteria. The IMS consults internal indexes and data structures to locate all matching data items. In addition, it schedules operations such as memory reads and writes, network access and disk activity, to actually retrieve the matching data items, and transmits the matching data back to the querying agent.
It is becoming more common to integrate IMS's, and a new level of information system has been developed, which may be referred to as an "index server" (or "service"). In order to provide access from a given IMS to another IMS, an index is created, which identifies where in the integrated system the requested information is stored; but the index does not store the information itself. An index server thus provides an organizational structure that can be used to locate and access all of the information stored anywhere in the integrated system, without duplicating the information itself, in a manner analogous to a library's card catalog or the index of a multivolume reference like an encyclopedia.
An index includes multiple "nodes", which are the index's entries. Each node contains a "key" and some contents. The key is the means by which the nodes of the index are sorted, e.g. alphabetically, chronologically, numerically, etc. Each node's contents may be a "leaf", i.e. a reference to the location of a data record (analogous to a page number in a table of contents), or may even be a reference to another index, which itself includes nodes with different keys and contents. Thus, by accessing a node, a user may be led to the actual location of the desired data item, or may instead be led to another index which can provide that location. The chain of indexes may be any length.
A conventional, stand-alone IMS includes its own index, which may be made visible to the user, so that the references at the nodes can be directly viewed. A primary distinction between such an IMS and an index server is that the latter provides only data location capability, whereas an IMS is used both to store and to locate data. Such an IMS 10 is shown in FIG. 1, and includes a user interface/processor 20 and a database 30, which includes conventional cache memory and logic necessary for managing the information in the database. The database 30 includes an index.
The database 30 and interface/processor 20 may share a single processor, such as if the database resides in a given computer or workstation. On the other hand, it may be that the database 30 is in one computer, and the user interface/processor 20 in another.
The user interface/processor 20, as well as the other user interfaces discussed below, may include conventional devices (not separately shown) such as a keyboard, a mouse, and other input devices by which the user can enter data queries, as well as output devices such as a video display terminal (VDT) and a printer. Each such user interface/processor may in fact be any apparatus which can generate such queries, including a microprocessor running an application or a device which is event-driven to request data. In the latter cases, the "user" may be taken to mean the application or hardware generating the request, rather than a person.
An index server system 40 is shown in FIG. 2, and includes a user interface/processor 50, and indexer (or index server) 60 (which conventionally has its own database), a first database 70, and a second database 80. When a request is made to the indexer 60 for a data item which is contained in database I (70) or database 2 (80), the indexer provides the user interface 50 with the correct source (i.e. database) information. The user then reenters the request, using the correct database as the information source.
The indexer 60, databases 70 and 80 and user interface/processor 50 may reside on one to four computers; thus, it is possible to have two or more databases (and/or the indexer) associated with the same processor in any combination.
Examples of index systems currently in use are the WAIS and GOPHER servers available on Internet. A user sends an information query to such an index server, which then locates a server which can provide the information. However, the server does not provide the information itself; rather, the user must then post a new query to the identified information source to retrieve the actual information desired. (This is analogous to references retrieved manually from the Reader's Guide to Periodical Literature.)
It is possible to merge such IMS's and index servers to form an "information broker". An information broker has an index which can be viewed by the user, like a standard index server. However, instead of merely including identification of the source where a given data item can be retrieved, as with a standard index server, the information broker includes a reference or pointer directly to that data item, so that when a query for the information is received, the information broker retrieves the data itself and not merely information about its location. In this sense, the information broker acts like the IMS, since the data is actually provided to the user. Some hypertext systems perform in this manner, and are thus information brokers.
The index of an information broker includes nodes containing two items:
(1) identification of an information source storing the requested data item; and PA1 (2) a data identifier which is particular to that information source, and is used to actually access the data. (A node containing this information may be referred to as an "external reference leaf". A node may alternatively list other nodes, or include the data itself. In the latter case, it may be referred to as a "data leaf".)
The data identifier will be different for different information sources; for example, for a query-based information system, the data identifier will be a query, while for object-or record-based information systems, the data identifier is a unique identifier of the data item. The term "data identifier" or DID will be used herein to refer generally to any such identifier of a given information system. In, by way of example, a bibliographic reference such as "Sussman 1983, page 3", the wording "Sussman 1983" corresponds to the identification of the data source, while "page 3" corresponds roughly to the data identifier. (No direct analog to the query form of the data identifier exists in book form.)
An information broker system 90 is shown in FIG. 3, and is similar to the index server system 40 shown in FIG. 2, except that the indexer/database 1 (reference numeral 110 ) is coupled via a bus 140 to the databases 2 and 3 (designated 120 and 130), respectively. Thus, when a misdirected query is sent by the user interface/processor 100 to the indexer 110 (i.e. where the requested information is not contained in the indexer's database), the indexer is able-using the correct routing information-to retrieve the requested data and return it to the user interface/processor 100.
Thus, in use today are all of these systems: IMS's, index servers and information brokers. The IMS is stand-alone, and provides both references and actual access to requested data items; the index server provides source information about data items in multiple databases, but no direct access; and the information broker provides multiple-database direct information access in a hyperlink-like system. A system can further be constructed which includes multiple information brokers including references to one another. In such a system, a query to a first broker (e.g. a company's in-house library information database) may result in the discovery that the requested information can be accessed by propagating a query to a different information broker (such as a university's library system), which in turn actually retrieves the information by sending a query to a conventional IMS (e.g. a CD ROM driver including a CD ROM of Books In Print). Thus, the query posted by an employee on the company's in-house network leads first to its own library, then to the university library, and finally to the CD ROM driver.
If both the company's server and the university's server are information brokers, the company employee need post only a single query to retrieve the information. However, this circuitous route is inefficient, occupying resources and taking time to execute. It would be far preferable if the employee could directly post a query to the CD ROM driver; but he or she will typically have no idea where the information is located. Even the company's library information database does not know where the information is stored; it is not until the university's library is accessed that it is learned that the information is stored on a given CD ROM. In a standard system, this route will be followed every time that requested data is accessed by the user.
A similar inefficiency can take place in a local system. Thus, a data query which is sent to a first database may be unsuccessful. If that database is set up as an information broker, however, the query is propagated to a second information broker, which itself may include a reference to an IMS containing the requested data item. Thus, the user ultimately retrieves the data, but only after going through one or several other servers. There is no provision in current systems for eliminating this circuitous query path.