The problem of accessing heterogeneous data sources in an integrated manner has typically been approached by representing the data in the participating heterogeneous data sources in a common data model. This involves mapping the native schemas of each data source to schemas in the common data model. In this specification, the term “native” means a language or schema peculiar to the specific data source. The integration of the participating heterogeneous data sources is enabled by having a common method to query the data sources and by allowing a single query to retrieve an integrated result from multiple data sources.
The process of enabling a heterogeneous data source for integrated access typically involves the following two steps.                1. At setup time, the native schema of a data source is mapped to a schema in the common data model. Typically, this step is performed manually by a person with expert knowledge of the data source using a software tool.        2. At runtime the data in a data source is mapped from its native schema to the mapped schema in the common data model by a data mapper process. This process often involves translating the query in the common data model to one or more queries or application program interface (API) calls that are processed by the native data source.        
A data mapper useful in such a process may take the form of:                (i) a wrapper, which is a software layer that encapsulates the data source and exposes a common API for accessing the data; or        (ii) a data server, which is a network-addressable server application that accepts requests from client applications and then retrieves and processes the data according to the requests.        
Typically, data mappers are programmed by a software engineer and deployed by a system administrator. The task of constructing and deploying a data mapper requires much effort and expert knowledge and is typically far beyond the ability of the actual users of the data. The construction of data mappers can be further complicated by the following issues.
First, it is often desirable to allow multiple possible representations of the same data in the common data model. This is often required when the data models of the two schemas are different (e.g., relational and hierarchical). Second, it is often desirable to be able to specify the schema to be used for a data source in the common data model. This is sometimes necessary when the user has the requirement that the schema of a data source enabled for integrated access must have a predetermined structure (e.g., when integrating data from a legacy data source into a system that already has clients accessing other data according to an existing schema).
There are a number of existing methods of providing a common data model, which enable querying of data stored in heterogeneous data sources. U.S. Pat. No. 6,263,342 issued to Chang et al on Jul. 17, 2001, U.S. Pat. No. 6,233,586 issued to Chang et al on May 15, 2001 and U.S. Pat. No. 6,272,488 issued to Chang et al on Aug. 7, 2001 disclose a method where a federated virtual view of the heterogeneous data sources is provided using an object oriented model. Federated query objects are translated into appropriate queries for individual data sources using Java objects. The method of constructing the Java objects for query translation is not disclosed.
U.S. Pat. No. 5,596,744 issued to Dao et al on Jan. 21, 1997 and U.S. Pat. No. 5,634,053 issued to Noble et al on May 27, 1997 provide a similar federated architecture. In these patents, a data dictionary is used to contain information such as schemas (of native data sources), data distribution, domain knowledge and inter-site relationships. This data dictionary is used to translate queries appropriately for individual data sources. However, no method is described for the addition of information for a new data source to the data dictionary.
U.S. Pat. No. 6,282,537 issued to Madnick et al on Aug. 28, 2001 discloses a method of querying heterogeneous data sources containing structured and semi-structured data. An export schema is used to define the data and its format for each individual data source. However, as in the previously mentioned patents, a method of generating the export schema is not described. In each of the above cases, the objective of the disclosure has been to describe the common data model that enabled querying of the heterogeneous data sources. The method by which new data sources are integrated into the common data model is not addressed.
U.S. Patent Application Publication No. 20020133504 by Vlahos et al published Sep. 19, 2002 discloses a method of using a data wrapper to publish data in a data source as virtual tables. An information server is then used to aggregate the virtual tables into a single universal data representation (common data model). Queries directed to the common data model can be translated appropriately and re-directed to individual data wrappers. Although, the information server performs some integration functions (e.g., accumulation, removal of duplication), the patent does not describe how the virtual tables of the data wrapper are constructed for a data source. In other words, the mapping of the data from individual data sources to the common data model is not fully described.
U.S. Patent Application Publication No. 20010034733 by Prompt et al published Oct. 25, 2001 discloses a method which uses a common hierarchical data model and Lightweight Directory Access Protocol (LDAP) as the protocol for accessing the data. It describes the mapping of a relational schema to a hierarchical schema (in the common data model) based on a relationship-driven and an ad-hoc approach. However, the relationship-driven approach does not address the issue of multiple possible mappings which often exist when a relational schema is mapped to a hierarchical schema. Moreover, the method does not address the mapping of other non-relational data sources to the common hierarchical data model.
More recently, Extensible Markup Language (XML) is being increasingly used as a common data model (for heterogeneous data sources) with XML query languages being able to query XML data. IBM's Java-based tool “XML for Tables” enables application programmers to use a provided set of Java classes to obtain an XML view of relational tables. Specifically, XML for Tables is designed to operate with IBM's DB2 database technology. Although, XML for Tables enables user defined views (schemas) for the XML data, it remains a tool for programmers. Multiple, different views of the data must be individually programmed with a knowledge of the data source. Furthermore, the tool is designed to be used with relational databases and therefore is limited in its application to heterogeneous data sources.
The arrangements disclosed in the above noted art are non-instructive of a tool by which a user, not necessarily skilled in the art of database management, may create and deploy a data mapper for a selected data source.