1. Field of the Invention
The present invention generally relates to DataBase Management Systems (DBMS) dispersed over a long haul network, and more specifically to a federated (global) DataBase controller that provides a user access to a virtual database by integrating heterogeneous DBMSs (data sites) dispersed over a long haul network.
2. Description of the Related Art
Large scale information and DataBase Management Systems have been developed independently and without consideration that one day they may need to be integrated. DBMSs exist in fields such as earth science, computer integrated manufacturing and medicine. As a result, the systems have become more and more complex, are often incompatible and are characterized by several types of heterogeneity.
Many different DataBase Management Systems (DBMS) models may be used to represent data, such as the hierarchical, network, and relational models. Aside from databases, many software systems (such as spreadsheets, multi-media databases and knowledge bases) store other types of data, each with its own data model. Furthermore, the same data may be seen by various users at different levels of abstraction.
In addition, the exact same data may be represented by as many different names and/or values as there are databases. For example, a patient in different medical databases may be represented by his or her name, social security number, drivers license number or a personal ID number. Furthermore, the patient's height may be represented in inches, feet, centimeters or meters.
Because of such differences, users find it difficult to integrate and understand the meaning of all the types of data presented to them. Analysts, operators and current data processing technology are not able to organize, process and intelligently analyze these diverse and massive quantities of information. In order to access data, users may have to learn different operating and word processing systems. This increases training costs and reduces efficiency, which often results in late reports to decision makers, missed intelligence opportunities and unexploited data.
Another related issue is the efficiency or rather inefficiency of searching each database for a piece of information. For example, a network of medical databases may interconnect over a hundred different databases of which any one patient may be included in only two or three. As the number, size and complexity of databases grows, the inefficiency of searching each database has become a significant problem.
Past and current research and development in distributed databases allows integrated access by providing a homogenizing layer on top of the underlying information systems (UISs). Common approaches for supporting this layer focus on defining a single uniform database language and data model that can accommodate all features of the UISS. The two main approaches are known as view integration and multi-database language.
The view integration approach advocates the use of a relational, an object-oriented (00), or a logic model both for defining views (virtual or snapshot) on the schemas of more than one target database and for formulating queries against the views. The view integration approach is one mechanism for homogenizing the schema incompatibilities of the UlSs. In this framework, all UISs are converted to the equivalent schemas in the standard relational, 00, or logical data model. The choice of the uniform data model is based on its expressiveness, its representation power and its supported environment. This technique is very powerful from the user's point of view. It insulates the user from the design and changes of the underlying Information Management System (IMS) Thus, it allows the user to spend more time in an application environment. However, the view integration approach has a limited applicability (low degree of heterogeneity) because there are many situations when the semantics of the data are deeply dependent on the way in which the applications manipulate it, and are only partially expressed by the schema. Many recent applications in areas where traditional DBMSs are not usable fall into this situation (multi-media applications involving Text, Graphics and Images are typical examples), in addition, there are no available tools to semi-automate the building and the maintenance of the unified view which is vital to the success of this technique.
In the multi-database language approach, a user, or application, must understand the contents of each UIS in order to access the shared information and to resolve conflicts of facts in a manner particular to each application. There is no global schema to provide advice about the meta-data. Ease of maintenance and ability to deal with inconsistent databases make this approach very attractive. The major drawback of this approach is that the burden of understanding the underlying IMSs lies on the user. Accordingly, there is a tradeoff between this multi-database language approach and the view integration approach discussed above.
Chin-Wan Chung, "Design and Implementation of a Heterogeneous Distributed Database Management System," Proceeding of the IEEE Computer and Communications Societies, pp. 356-362, 1989 discloses a software program called DATAPLEX. DATAPLEX translates an SQL query into various other query formats so that it can query non-relational data base systems. Marjorie Templeton et al., "Mermaid-A Front-End to Distributed Heterogeneous Databases," Proceeding of the IEEE, Vol. 75, No. 5, May 1987, pp. 695-707 discloses a program called "Mermaid" that allows the user of multiple databases stored under various relational DBMSs running on different machines to manipulate the data using a common language.
There remains an urgent need to integrate these dispersed heterogeneous databases to provide uniform access to the data, to maintain integrity of the data, to control its access and use, and to improve search efficiency. Rather than requiring users to learn a variety of interfaces in order to access different databases, it is preferable that a single interface be made available which provides access to each of the DBMSs and supports queries which reference data managed by more than one information system. The single interface should provide true integration of the heterogenous databases so that the user can easily access and analyze data that is not represented uniformly across the databases. The interface should also restrict the search to only databases that potentially contain the data of interest to improve efficiency.