1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, systems, and products for tree construction in support of XML to XML document transformation by use of graphically-specified transformation rules.
2. Description of Related Art
XML is the Extensible Markup Language. XML is designed to provide flexible and adaptable information formatting and identification. XML is called extensible because it has no fixed format like HTML, the Hypertext Markup Language, which is a set of predefined markups. Instead, XML is actually a ‘metalanguage’—a language for describing other languages—which allows users to design customized markup languages for many different types of documents.
XML's principal purpose is structuring data. Structured data includes things like spreadsheets, address books, configuration parameters, financial transactions, and technical drawings. XML includes a set of rules for designing text formats that support structuring data.
Like HTML, XML makes use of elements, tags, and attributes. Elements are content segements identified by tags. Elements have possibly empty values, the value of an instance of an element being the string between the start-tag and end-tag for the instance of the element. ‘Tags’ are words bracketed by ‘<’ and ‘>,’ and attributes are defined characteristics of elements having for example the form: AttributeName=“value”. While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, although in the predefined syntax of HTML, “<p>” means ‘paragraph,’ “<p>” in an XML file means whatever the reading application says it means. Depending on the context, it may be a price, a parameter, a person, or in many cases it represents an entity having nothing to do with Ps.
The formal relations among elements and attributes in XML documents are governed by declarations set forth in Document Type Definitions or ‘DTDs.’ A DTD is a formal description in XML Declaration Syntax of a particular type of XML document. The governing DTD for a particular XML document sets out what names are to be used for the different types of element, where they may occur, and how they all fit together. For example, in a document type describing Lists which contain Items, the relevant part of the governing DTD may contain:                <!ELEMENT List (Item)+>        <!ELEMENT Item (#PCDATA)>        
These declarations define a List as an element containing one or more Items (the plus sign means one or more). These declarations also define Items as elements containing plain text (Parsed Character Data or ‘PCDATA’). Validating parsers read the DTD before reading documents governed by the DTD so that the parsers can identify where every element ought to occur and how each relates to the others, so that applications which need to know this in advance (for example, editors, search engines, navigators, and databases) can set themselves up correctly. The example declarations above supports creation of lists in XML like this example snippet:                <List>        <Item>Chocolate</Item>        <Item>Music</Item>        <Item>Surfing</Item>        </List>        
There are three instances of the element named ‘Item’ in the above example snippet of XML, having values respectively of “Chocolate,” “Music,” and “Surfing.”
How such a list appears in print or on a computer screen depends on an additional document or file called a stylesheet. Unlike HTML, there are no display formatting elements in XML itself. Placing all display formatting in separate files means that display appearance can be changed for all compliant XML documents with no need to edit the XML documents themselves.
There are thousands of DTDs already in existence for many subjects. Many of them can be downloaded and used freely, or users can develop their own DTDs using the XML Declaration Syntax. In fact, it is the growing ubiquity and power of XML and its governing DTDs that creates challenges for users.
XML is not a programming language as such; it is a markup standard for structuring data. There is no need for users to be programmers in order to use XML. On the other hand, DTDs are becoming more widespread, and many DTDs are becoming large and complex. In addition, as more and more data structures, databases, and document types are implemented in XML documents whose structures are governed by DTDs, there is more and more demand for conversion among different structures.
Consider an example of an Internet sales company who purchases a large vendor database from a supplier. The sales company wishes to integrate the vendor database into its sales database. Many of the fields in the two databases map one-to-one, vendor name, vendor street address, city, state, zip code, and so on. Many of them, however, do not. And even the one that do map one-to-one have different field names in the two databases. The sales database's name field is called CustomerName; the vendor database's name field is called VendorName; and so on.
In addition, many desirable conversions are extremely complex: The sales company maintains statistical totals of customers in several categories or even in several different types of categories. The vendor database contains fields that can be mapped into the sales database's category fields, but in order to effect this mapping, running totals must be created for many vendor database fields to map into a single sales database category field at conversion time.
In this example, the vendor database, the source, is expressed in a source XML document governed by a source DTD, and the sales database, the target, is capable of importing data expressed in a target XML governed by a target DTD. The target DTD exists, defining the data structures recognizable by the import function of the target database. The challenge is how to created the target XML document from the source XML document. That is, the challenge is how to convert the data expressed in a source data format conforming with a source DTD into a target format that conforms to a different DTD, the target DTD.
In prior art, although the personnel that developed the XML documents were not required to be programmers, the personnel that write the translation routines, the translation rules for mapping or converting data from the source database to the target database, must not only be programmers, they must be programmers skilled in XML, XML Declaration Syntax, and some special purpose transformation language like XSL for example. And they must be numerous programmers. It would be very advantageous, therefore, it there were means and methods to enable non-programmers, less skilled programmers, or fewer programmers, to establish translation rules for converting a source XML document to a target XML document when the two XML documents have data structures defined and governed by two different DTDs.