1. Field of Invention
This invention relates in general to information systems, and more particularly to data exchange and data storage among information systems.
2. Description of Background
Modern information systems rely heavily on both data exchange and data storage. Data exchange enables interaction between different components in an information system. Additionally, data exchange makes it possible for an information system to interact with other information systems. Data exchange between information key feature of current enterprise systems.
Data storage is used extensively to handle the various data used by information systems. Information systems are increasingly attempting to share common data storage pools across organizations. In some cases data stores are being shared between organizations to support joint enterprise systems. Data storage is commonly used to integrate data from disparate systems to present a unified view of data that may originate from varying sources.
In order for data exchange and data storage to function all parties involved must agree on a common format and structure before direct data exchange or sharing via a data store can be accomplished. This format and structure information is known as the data schema. With both data exchange technology and data storage technology all data to be exchanged or stored must conform to a well-defined data schema in order for the information system to interpret the data.
In practice data schemas are defined by the target data store, the integrated data view or as a requirement on the data exchange process. The key requirement in all cases is that the data to be stored, integrated or exchanged and must conform to a shared data schema. That is, interaction between information systems relies upon both data producers and data consumers to agree upon the data schema to be used.
When these data interactions cross-organizational and administrative boundaries problems arise. These problems are based on the difficulty of managing a common definition and ensuring data compliance with the agreed upon data schema across the organizational and administrative boundaries. It is common for each party involved in a data interaction to have their own internal data schema. This internal schema is often influenced by factors that are completely unrelated to, and likely to take precedence over any data interaction requirements. Some factors that commonly influence internal schema designs include: the organization's existing internal data stores, internal application structures and behavior, business processes and needs, political and administrative structure of the organization, and software development constraints.
It is often possible to align an organization's internal data schemas with the schemas necessary to allow data interaction with other organizations. Organizations that need to perform data interactions with other parties generally invest significant development and maintenance effort to ensure that information systems conform to the agreed upon common data schemas. When these schemas evolve further effort to update, test and deploy schema-dependent portions of the information systems is necessary. As organizations increase the types of data interactions they are party to the required effort to maintain translation from the internal data schemas to the common data schema increases in direct proportion to the breath of the interactions.
To address these issues the concept known as schema mapping has been investigated within the following disclosure. For example, given two schemas, A and B, it is possible to define a mapping specification, which captures the correspondences between elements in schema A and elements in schema B. With this mapping information and an input document which conforms to schema A, it is possible to automatically produce an output document that corresponds to the input document data and conforms to schema B. Throughout this application, this process is referred to as executing the mapping. One skilled in the art should know that a mapping may involve a single source and a single schema, or alternatively a mapping may involve multiple sources and multiple schemas.
The disclosure pertains to a software tool, which automatically generates the source code for a custom application that executes a given mapping between schemas. That is, given a set of source and a set of target schemas together with a mapping specification that maps from the source schemas to the target schemas, the disclosed tool will generate the source code. This mapping application is able to read in input data documents that conform to the source schemas and produce output data documents that comprise the input document data in a form that corresponds to the target schemas based on the mapping specification. The disclosed invention may also be utilized to generate software artifacts other than applications, for example and not meant to be limiting, the disclosed invention may be utilized to generate software artifacts for a web service, or a software component, etc.
XML to XML mappings can be expressed as transforms over XML documents using query/script based techniques. For example, the mapping can be expressed as an XQuery or XSLT script that performs the specified mapping. Earlier work with the disclosed mapping tool automatically produced XQuery and XSLT transformation scripts based on an XML-to-XML map specification. Passing the transformation script along with an input data document into a script execution engine performs execution of these scripts over an XML data document. That is, passing the XQuery script into an XQuery execution engine along with the data document; or passing the XSLT scripts into an XSLT execution engine along with the data document.
A generic mapping engine could be used to address the problem described above. The generic mapping engine takes as input the source and target schemas, the map specification, and the data document to be transformed. Effectively a generic engine interprets the schemas and map specification at runtime to transform the input data document. Although practical, this kind of generic approach has two disadvantages when compared to the disclosed invention:                1. Increased complexity of the engine implementation, and        2. Longer execution times as a result of the indirection required to interpret the map specification at runtime.        
The preliminary testing of the code generation approach versus a generic mapping engine show that the generated mapping application runs 45%-65% faster than a generic mapping engine over the same map specification and input document.
The generated applications are implemented in a person-friendly coding style making it easy for developers to understand, review and extend the generated code.