This invention relates to databases, and more particularly to a plurality of databases that are logically combined to form a meta-database, such as a meta-directory.
A great deal of corporate data is buried in network devices, such as PBXs, messaging platforms, email platforms, etc. Typically, each of these devices possesses only the information that is needed for its specialized need, maintains it in a database, and possesses means for administering this information. The means for administering typically must deal either with a proprietary interface, or a standard protocol against a proprietary schema; but typically that presents no problems, as long as one does not want to employ the data in an inter-platform manner. Efforts to use, modify, and update such data in an inter-platform manner, however, leads to many problems, including the need for data replication and difficult interoperation problems with diverse devices and applications.
Nevertheless, the emerging need to provide organization-wide access to data is creating a demand to interconnect previously isolated systems. As a result, integrating information from multiple heterogeneous data sources has become a central issue in modern information systems. A data integration system provides uniform and transparent access to multiple data sources, making information more readily accessible and allowing users to pose queries without having to interact with a specific source, using the proper interface.
Even though an integrated system produces many advantages, as indicated above, difficult problems arise when integrating information from multiple sources; most notably autonomy and heterogeneity. Autonomy relates to the fact that some systems operate under separate and independent control, using their own data model and Application Programming Interface (API). Heterogeneity can arise at different levels. For instance, different systems may use different APIs, different vocabularies, (e.g., use the same term for different concepts or different terms for the same concept) different schemas, etc.
Building custom applications that assemble data from appropriate locations is not always a practical solution. It can be prohibitively expensive, inflexible, and hard to maintain.
Several research projects have developed mediator systems to address these problems. See, for example, G. Wiederhold, “Mediators in the Architecture of Future Information Systems,” IEEE Computer, pp. 38-49, March 1992. A mediator system provides an intermediate layer between the user and the data sources. Each data source is wrapped by software that translates local terms, values and concepts into global concepts shared by some or all sources, thereby smoothing the semantic heterogeneity among the various integrated sources. The mediator then obtains information from one or more wrapped components, and exports the information to other components. Queries to the mediator are in a uniform language, independent of the distribution of data over sources and the APIs of the source. Another thing that can be said about mediators is that they concentrate on read-only queries. With mediators, queries that are posed against the unified system are dynamically executed at the various data sources, rather than materializing subsets of the data from the various sources in an integrated directory.
In an effort to employ the data that is available on different platforms, a widely deployed directory access protocol has been developed, known as Lightweight Directory Access Protocol, or LDAP. See, for example, S. Cluet et al, “Using LDAP Directory Caches.” Proceedings of PODS, 1999, and R. Arlein et al “Making LDAP Active With the LTAP Gateway: Case Study in Providing Telecom Integration and Enhanced Services,” Proceedings Workshop on Databases in Telecommunications, September 1999. To supply all the functionality that users expect, middleware to integrate the LDAP directories with network and telecommunication devices is needed. This integration makes data that has traditionally been buried in network/telecommunication devices like routers, PBXs, and messaging platforms available to new applications that can add value to the data. In addition, since much of this data is replicated in multiple devices, corporate directories, and provisioning systems, integration reduces the need to manually re-enter such data, and consequently, it reduces data inconsistencies across repositories.
From a database perspective, LDAP can be thought of as a very simple query and update protocol. Directory entries are stored hierarchically in a tree fashion, which makes the arrangement easily scalable. Each entry in the tree is identified by a Distinguished Name (DN), which is a path from the root of the tree to the entry itself. The DN is produced by concatenating the Relative Distinguished Name (RDN) of each entry in the path. The RDN for an entry is set at creation time and consists of an attribute name/value pair—or in more complicated cases, a collection of these pairs. The RDN of an entry must be unique among the children (i.e., lower branches) of a particular parent entry in the tree.
One limitation with LDAP is that its update services can only create or delete a single leaf node, or modify a single node; that is, LDAP has the Modify command, and the ModifyRDN command. The Modify command modifies any field of an entry except the RDN field, and ModifyRDN modifies the RDN field. Another limitation is that while individual update commands are atomic, one cannot group several update commands into a transaction. For example, one cannot atomically change a person's name and telephone number if the name is part of the person's RDN but the telephone number is not.