1. Technical Field
This invention relates to information conversion and in particular to the conversion of information data sets comprising data elements structured according to a first predefined structure into information data sets comprising data elements structured according to a second predefined structure, by way of an intermediate structure.
2. Related Art
It is known to define information in a structured manner using a structured mark-up language such as the eXtensible Mark-up Language (XML) Version 1.0 as defined for example in a document published on the Internet by the Worldwide Web Consortium. XML Version 1.0 provides an open and flexible specification for annotating information with a predefined set of meta-information using an unlimited number of possible meta-information structures. In this way, information may be personalised and customised for individuals or for groups of individuals.
Attempts are being made to standardise on the use of meta-information for annotating information in particular domains by means of agreed schemata. For XML version 1.0 documents, in particular, a document description language has been agreed, called the XML Schema language, for use in creating XML schemata that define the structure and vocabulary to be used to create XML documents in a given domain. The XML schema language has been defined in two parts in documents published on the Internet in May 2001 by the Worldwide Web Consortium. This approach is intended to standardise the meta-information associated with different pieces of information content to enable data exchange and other forms of content analysis to be more generic and hence to enable more effective access to information content annotated by meta-information that is common across a given domain. Unfortunately, while XML encourages standardisation, it has led to several hundred different competing XML schemata, sometimes in similar industries. Many of these schemata are in the adoption phase, e.g. for the financial, manufacturing, education and other industries (a list of XML schemata can be found at www.xml.org), but it is considered unlikely that XML documents generated in a given domain will ever be uniformly compliant with published “standard” schemata.
It is therefore desirable for information content providers to be able to supply information content described using a number of different schemata and to convert easily between them. Different information content providers may use variations on a given schema, or similar organisations may use the same schema but may have defined their own vocabularies for certain elements, to produce schema variants specific to their own organisations. Conversion between XML documents compliant with such schemata may take various forms, as illustrated in the diagram of FIG. 1.
Referring to FIG. 1, three different types of XML document conversion are shown where XML documents compliant with a Schema A 100 need to be converted into XML documents compliant with a Schema B 105. Where vocabularies and data types used in the two schemata A 100 and B 105 are identical, then conversion between respectively compliant XML documents involves a direct mapping 110 with no transformation being required. Where vocabularies and data types used in the two schemata A 100 and B 105 are different, but structurally the schemata are identical, then conversion between respectively compliant XML documents involves only vocabulary and data type transformation 115. Where the two schemata A 100 and B 105 are different, then conversion between respectively compliant XML documents involves both structural transformation 120 and vocabulary/data type transformation 115.
Data in an XML document is the information contained in text nodes of its elements and also in attributes of those elements. The most commonly used technology for XML document conversion is XSLT, adopted as a standard by the Worldwide Web Consortium and defined in a document published on the Internet. This language can be used to manipulate XML documents by reordering or reformatting information and can perform simple restructuring of meta-information contained in an XML document. There are a number of commercially available tools that can be used to produce and test XSLT stylesheets, including Marrowsoft Xselerator™ and TIBCO™ XML Transform. It is also known from n-tier hub-based technology (ACM, SIGMOD, March 2002) to build a schema translation engine using a database table to list common elements in XML documents and hence to translate between them.
While XSLT can provide means for performing conversion of at least the less complex structures in XML documents, a different XSLT stylesheet is usually required to convert from one schema-compliant format to another, so that if there are n different schemata in use, n(n−1) different XSLT stylesheets are required to achieve all the conversions likely to be required between XML documents compliant with those n schemata. Besides the need for a number of different XSLT stylesheets, there are also complications in using XSLT to convert certain types of structure, in particular recursive structures.