A database management system (DBMS) controls the organisation, storage, retrieval, security and integrity of data in a database. Many organisations today use multiple, distributed, heterogeneous databases to support their processes (e.g. sales, marketing, purchases, employee databases) supplied by a variety of vendors (e.g. IBM, Oracle, Sybase). Each database may be using different data models including relational, object-relational and object oriented ones. Additionally, each database may provide different mechanisms (interfaces) for accessing the data. Furthermore, data of interest may be distributed among more than one database. Further complicating issues stemming from the use of multiple and heterogeneous databases (see Sheth et al., Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases, ACM Computing Surveys 22(3), pp 183-236, 1990), render users unable easily to obtain and collate the information they require in a timely fashion.
The management of such multiple, distributed databases draws upon the area of database integration. In this field, there are two approaches to this integration. These are Mediators and Federated Database Management Systems (FDBMS).
Mediators are components that explicitly implement the integration of component databases. Their job is to store and retrieve data by translating the client requests and results between a high-level data model and the native data model of each data source. Example systems that use mediators are TSIMMIS (see S. Chawathe et al., The TSIMMIS Project: Integration of Heterogeneous Information Sources, Proceedings of IPSJ Conference, pp. 7-18, Tokyo, Japan, October 1994), DISCO (see A. Tomasic et al., Scaling Heterogeneous Databases and the Design of Disco, Proceedings of the International Conference on Distributed Computer Systems, 1996) and Garlic (see M. J. Carey et al., Towards Heterogeneous Multimedia Information Systems: The Garlic Approach, Proceedings of the Fifth International Workshop on Research Issues in Data Engineering (RIDE): Distributed Object Management, 1995).
In the FDBMS approach, data schemata of the component databases undergo several stages of transformation until a common integrated schema is produced. This is represented in a semantically rich data model, i.e. one that can describe all the specific features of the individual models adopted by each component source. This approach is applied by database vendors who, apart from their main commercial databases, develop so-called database gateways, i.e. solutions that provide transparent access to content stored in other vendors' databases. For instance, Sybase Inc. have developed the ‘Adaptive Server®’ that accesses data stored in Sybase as well as in DB2 (by IBM) and Oracle. Similar capabilities are offered by ‘ORACLE 8i’, the ‘Informix Enterprise Gateway’ and IBM's ‘DB2 DataJoiner’. Additional examples may be companies that solely develop data integration solutions such as Cohera with their ‘Content Integration System’ and Data Integration Inc with ‘InterViso’.
However, in these approaches there are two major deficiencies. First is the inflexibility to change. If a need for change occurs (including the addition of new databases/datastores to the system, the removal of existing ones or changes to the required behaviour of the system), the system must be statically reconfigured before the changes take effect. Such a reconfiguration process may involve the substitution of a new software part for an old software part, the change of configuration parameters of the system and the restart of the system. Second, there is a requirement in these approaches that the target database/datastore be identified in the request to store data or query data. Where mediators are used this requirement manifests itself as the need to select the particular mediator for handling the request, since each mediator only integrates/involves certain databases. Similarly in the FDMBS approach, the query or storage request should explicitly identify the databases that are to be targeted in the data handling process.
An interface that allows flexible management of data in a distributed environment of heterogeneous data storage systems is therefore required.