1. Field of the Invention
This invention relates to data transformation and, more particularly, to an improved schema mapping language and graphical mapping tool, which is suitable for use within an event-driven data transformation system.
2. Description of the Related Art
The following descriptions and examples are given as background only.
Databases play an integral role in the information systems of most major organizations, and may take many forms, such as mailing lists, accounting spreadsheets, and statistical sales projections. After using several generations of technology, an organization having data stored in many different systems and formats may wish to combine or integrate their disparate data into a common format, or one that meets their current needs. To achieve data integration between disparate data sources, it is often necessary to transform the content, format, and/or structure of the data in the data sources.
E-commerce is another area that relies upon data integration and transformation. For example, the explosive growth and wide-spread use of the Internet has encouraged more and more businesses to develop new ways to facilitate efficient and automated interactions between their own internal departments, as well as their customers, suppliers, and business partners. However, building successful e-commerce systems presents many challenges to the system architect, as each company may store data and documents in different formats. To be successful, a business may need to transform the content, format and/or structure of the data into a form used or preferred by its internal departments, customers, suppliers and business partners.
Data transformation generally refers to a sequence of operations that transforms a set of input data (i.e., a “data source”) into a set of output data (i.e., a “data target”). More specifically, data transformation refers to the changing of the content, format, and/or structure of the data within a data source. Though the term “data conversion” has a slightly different technical connotation, it is often used synonymously with the term “data transformation.”
Common content changes typically include adding, deleting, aggregating, concatenating, and otherwise modifying existing data. Common data formats include, but are not limited to, binary files, sequential files, embedded binary data, EBCDIC data from mainframes, ISAMs (Indexed Sequential Access Methods) and other record managers, PC-based databases, accounting applications (including Enterprise Resource Planning, ERP, or Customer Relationship Management, CRM), business-to-business applications (including those using EDI, XML, HIPAA, HL7, FIX or SWIFT) and Web-based data. Common data source structures include, but are not limited to, spreadsheets, contact managers, mail list software, and statistical packages. A data transformation, as used herein, may involve changing one or more of the above-mentioned characteristics of a data source.
Current data transformation techniques include custom-coded solutions, customized tools and off-the-shelf products. Custom-coded solutions and customized tools are generally expensive to implement, are not portable, and are difficult to adapt to new or changing circumstances. Off-the-shelf products also tend to be costly, have steep learning curves and usually require system developers to create code for mapping input data to output data. Regardless of the particular solution chosen, the process of transforming data becomes increasingly complicated with each increase in the number of data sources, the number of data targets, the content of the data sources/targets, the format of the data sources/targets, and the complexity of the corresponding data structures.
At the heart of every data transformation technique is the requirement to map data from a data source to a data target. This is commonly achieved, in both database management and e-commerce systems, by mapping the elements of a source schema to those of a target schema. In general, a “schema” defines the structure and, to some extent, the semantics of a source or target.
The eXtensible Markup Language (XML) provides a standardized format for document and data structure definition, and was developed by the World Wide Web Consortium (W3C) to provide an easier way to integrate data between applications and organizations over the Internet. In the past, when two organizations wanted to exchange or integrate information over the Internet, they would create schemas for their documents in XML. An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. Many XML-based schema languages exist, including but not limited to, the Document Type Definition (DTD) language, the XML Schema and RELAX NG.
Once source and target schemas were defined, system developers would create code for mapping the source schema elements to target schema elements using an XML-based transformation language. Although many transformation languages exist, the eXtensible Stylesheet Language (XSL) was often used to create mapping code by defining an XSL transformation style sheet. In general, the XSL style sheet is a form of mapping which includes a template of the desired target structure, and identifies data in the source document to insert into this template. This model of merging data and templates is referred to as the template-driven model and works well on regular and repetitive data. XSL also provides capabilities for handling highly irregular and recursive data, as is typical in documents.
However, defining an XSL style sheet can be difficult, especially for complex schemas and mappings. In addition, code must be written for mapping the relationships between each source and target set. Whether the mapping code is written in XSL or another transformation language, writing this code is difficult, time consuming and typically beyond the capabilities of ordinary business personnel.
One way to resolve this problem is to build a graphical representation of the mappings between the source and target schemas. There are currently several graphical mapping tools on the market, which allow a user to quickly create mappings between source and target schemas by drawing lines or “links” between elements of the source and target schemas. In one example, a graphical mapping tool may present a source schema on the left side, and a target schema on the right side of a user interface window. A mapping region arranged between the source and target schema representations may be used to graphically and functionally link the elements of the source and target schemas. Once the mappings are graphically defined, the mapping tool automatically generates an executable transformation program, e.g., by compiling an XSL transformation style sheet, to transform the data.
To control the transformation between source and target schemas, many graphical mapping tools allow “functoids,” “function components,” or “function blocks” (referred to collectively herein as “functoids”) to be inserted within the links connecting the source and target schema elements. A “functoid” is a graphical representation of a functional operation used in the transformation of a source schema element to a target schema element. Each functoid has pre-defined script or program code associated with a single functional operation, such as a mathematical or logical operation. The graphical mapping tool provides a plurality of functoids for user selection and insertion into the graphical mapping area. The transformation between a source schema element and a target schema element is dictated by the combination of all links and functoids connecting the source and target elements together.
Graphical mapping tools, such as those described above, reduce the time and effort involved in the mapping process by providing a relatively simple, graphical method for defining schema mappings. These graphical mapping tools allow business personnel to generate mappings without extensive knowledge of programming languages or the assistance of programmers. However, graphical mapping tools of this sort have many limitations.
First of all, large schema mappings are difficult to represent graphically in functoid-based graphical mapping tools, as the use of functoids creates congestion in the mapping area. Furthermore, while functoids can have multiple inputs, they typically have only one output, which must be copied to link the functoid output to multiple target elements. This may further increase congestion by requiring various links and functoids to be replicated if a source element is to be mapped to multiple target elements, as is often the case when mapping hierarchical data structures. Moreover, the mapping region in functoid-based mapping tools does not specify relationships between source and target elements, making it difficult to relate any given target with its corresponding source. Finally, while functoids graphically represent the rules used to transform a source schema into a target schema, they cannot be used to initiate or control transformations, which are dependent on the occurrence of a particular event (i.e., event-driven transformations).
Therefore, a need exists for an improved schema mapping language and graphical mapping tool, which overcomes the disadvantages of conventional mapping languages and tools. An ideal schema mapping language and graphical mapping tool would require a minimum amount of custom programming on the part of the user, would be as simple possible and would be as visual as possible. In addition, an ideal schema mapping language and graphical mapping tool would support schema mappings between multiple sources/targets, intermediate targets, multi-mode targets and event-driven transformations. Furthermore, an ideal schema mapping language and graphical mapping tool would be able to generate and control mappings between hierarchical data structures (such as found, for example, in business-to-business applications) with the same efficiency and ease with which mappings are generated and controlled between relational data structures (e.g., flat file data structures).