Businesses, enterprises, and other organizations use databases to store and manage information ranging from inventory, clients, accounts, products, and the like. Moreover, businesses often need to manage and merge data from many different sources, including business partners, data feeds, legacy systems, and the like. The dramatic growth of the Internet and electronic commerce has increased businesses' reliance on the ability to capture, use, and integrate data from multiple sources encoded using different data schemas. Transforming data from one data schema to another requires data mappings between the data source(s) and a data target.
An important issue in modern information systems and electronic commerce applications is providing support for inter-operability of independent data sources. A broad variety of data is available on the Internet in distinct heterogeneous sources, stored under different formats such as: database formats (e.g., relational model), semi-structured formats (e.g., data type definitions, standard generalized markup language, extensible markup language schema), scientific formats, etc. Integration of such data is an increasingly important issue. The effort involved in such integration is considerable. Translation of data from one format or schema to another requires writing and managing complex data transformation programs or queries.
The issue of schema-mapping involves translating data from one independently created schema (e.g., a source schema) to another independently created schema (e.g., a target schema). The schemas may have different semantics, and this may be reflected in differences in their logical structures and constraints. Moreover, the source and target schema may not represent the same data. There may be source data that is not represented in the target, and should thus be omitted in the translation or mapping process. However, there may be a need in the target schema for data not represented in the source schema. In certain cases, values must be produced for undetermined elements or attributes in the target schema, e.g., target elements for which there is no corresponding source element. Values may be needed if the target element can not be null, such as elements in a key, and no default is given. More importantly, the creation of new values for such target elements is essential for ensuring the consistency of the target data.
Presently, tools that facilitate the task of transforming data work at a technical level. For example, there exist mapping tools that provide a user interface (UI) showing the structure of a data schema as a tree of elements used to encode the data. There also exists mapping tools that display trees of elements side-by-side for the two data schemas to be matched, in which the user is able to manually create links between matching elements of the source and target schemas. However, the approaches taken by existing mapping tools are time-consuming to the users and prone to error. Furthermore, these approaches are generally not suitable for a non-technical user for several reasons. For example, the existing approaches do not work with actual data values. Without data values to provide the user with a better understanding of the schema elements, the user is often unsure of how to interpret schema elements. Also, the existing approaches require a debugging cycle. In the existing approaches, the user first creates schema mappings, then invokes a tool to transform an example instance of the schema based on the schema mappings. The user must then manually check that the results of the transformation are correct.
Therefore, there is a need for systems and methods that provide a comprehensive yet straightforward solution to building, refining and managing mappings between heterogeneous schemas.