The present invention relates generally to the field of data transformation and integration and, more specifically, to providing scalable extensible markup language (XML) transformation based on schema mappings.
Transforming data from one format to another is frequently required in modern information systems and Web applications that need to exchange or integrate data. As XML becomes more of a standard for data exchange among applications (especially over the Web), transforming XML data (also referred to as XML-to-XML transformation) may become increasingly important. XML-to-RDB (relational database) transformation (known as XML shredding) and RDB-to-XML transformation (known as XML publishing) are special cases of XML-to-XML transformation.
Writing data transformation programs manually—even in high-level languages such as XQuery (XML Query), XSLT(eXtensible Stylesheet Language Transformation), or SQL/XML, which is an SQL (Sequential Query Language) extension for publishing tables as XML—is often time consuming and error-prone. This is because a typical data transformation task may involve restructuring, cleansing and grouping of data, and implementing such operations can easily lead to producing large programs (queries) that are hard to comprehend and often hide the semantics of the transformation. Maintaining the transformations correctly, for example as database schemas evolve, can also involve similar problems. As a result, it is desirable to have tools to assist such data transformation tasks.
Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as XML over the Web or in XML databases. By means of mappings from source to target schemas, Clio can help users conveniently establish the precise semantics of data transformation and integration. One of the aims of Clio is to provide high-level mapping languages and more intuitive graphical user interfaces (GUI) for users to specify transformation semantics in convenient ways. For example, the Clio system can be used to create mappings from a source schema to a target schema for data migration purposes. Also, Clio can be used for generating mappings between relational schemas and XML schemas. The user can be presented with the structure and constraints of two schemas and asked to draw correspondences between the parts of the schemas that represent the same real world entity. Correspondences can also be inferred by Clio and verified by the user. Given the two schemas and the set of correspondences between them, Clio can generate the SQL/XML (or XSLT or XQueries) queries that drive the translation of data conforming to the first (source) schema to data conforming to the second (target) schema. In the first schema-matching phase, the Clio system establishes, semi-automatically, matchings between source XML-schema elements and target XML-schema elements. In the second schema-mapping phase, the Clio system generates, also semi-automatically, a set of logical constraints (or logical mappings) that capture the precise relationship between an instance (or document) conforming to the source schema (the input to the transformation) and an instance (or document) that conforms to the target schema (the output of the transformation).
Schema mapping tools such as Clio provide user-friendly means to manage and facilitate the complex tasks of heterogeneous data transformation and integration. By means of mappings from source to target schemas, such mapping tools can help users conveniently establish the semantics of data transformation and integration. Other examples of systems that are focused on the high-level specification and generation of data transformations and data integration applications include Rondo, a generic platform for managing and manipulating models, such as schemas, together with the mappings between them. As in Clio, mappings may be specified by using logical constraints. Other examples include Piazza and HePToX (HEterogeneous Peer TO peer Xml database system), which are also based on mappings but focus on query rewriting for data integration, instead of data transformation. In addition, many industry tools such as Microsoft ADO.NET v3 (ER-to-SQL (Entity Relationship-to-SQL) mapping system), IBM Web Sphere Data Stage TX, Stylus Studio's XML Mapper, and IBM Rational Data Architect (which uses Clio) support the development of mappings.
The aforementioned examples of schema mapping tools solve many problems of specifying transformation semantics. The problems, however, of efficiently implementing such mapping-driven transformations and of correctly and efficiently executing mapping-driven data transformations still remain. Current practice for such data transformation is to use XSLT or XQuery generated from the mapping tools. Directly using these general query languages for transformation, however, often leads to performance problems.