Many computer users have experienced the problem of data stored in an inaccessible format. For example, a document stored in Portable Document Format (“.pdf”) made popular by ADOBE® may not be accessible from the popular MICROSOFT® WORD word-processing application. To view a document stored in .pdf format, a user must first acquire an appropriate application, such as the ADOBE® ACROBAT READER. Another familiar situation is that of accessible but scrambled or improperly formatted data. Some applications may attempt to make sense of documents designed for other applications, resulting in display of a document that does not look like the original, but is a scrambled combination of the original text and a set of strange characters.
One solution to the problem of inaccessible data is to transform the data into an accessible format. For example, one could write a program that transforms .pdf files into familiar .doc files, and then access the files using MICROSOFT® WORD. Many applications in use today offer a selection of file formats when saving data. This feature allows users to effectively transform their data into a format that will be useful for them when they access the files from other applications.
Transforming files from one format to another is commonplace in the context of data stored in an Extensible Markup Language (“XML”) format. This is in part because the XML data format has experienced wide use. XML is quickly becoming the de facto standard for exchanging corporate data via structured documents, whether internally with business partners or via public applications across the Internet. The World Wide Web Consortium (W3C) has endorsed XML as the standard for document and data representation. With the proliferation of data stored in XML formats, the transformation of such data into formats that are recognized by diverse applications has also become widespread.
Another reason for the widespread necessity of transforming XML data arises from the extensible nature of the XML syntax. This syntax can be used to describe any arbitrary concept or tangible item. In other words, an architect can create an XML element called “skyscraper,” a chemist can create “Bunsen burner” element, and a shipping company can create a “tractor trailer” element. Each of these professionals can then write programs that identify their own purpose-designed elements, and manipulate those elements according to their unique needs.
What happens when the architect joins a large architecture firm that uses an XML element identified as “big building” instead of “skyscraper”? The architect's programs will not work with the “big building” data generated by the large firm. Likewise, programs created by the large firm to identify “big building” elements will not recognize the architect's “skyscraper” elements. This is so even though the underlying data describing skyscrapers is represented in XML syntax, and it may be quite similar to the underlying data describing the firm's “big buildings.” To use the architect's “skyscraper” data, the “skyscraper” identifiers must be changed to “big building.” The underlying data tree, or structure of data, may also require transformation. If the architecture firm used some other program that did not support XML, there would be a need to transform the architect's data into some other format, e.g., .pdf, plain text, or Hyper-Text Markup Language (“HTML”) format.
Because XML is extensible, the generation of unique element identifiers is widespread, leading to a correspondingly widespread need to generate uniquely tailored transforms for XML data. Moreover, writing a single transform that works for all XML data is impossible, because XML data is identified using an infinite variety of element identifiers, and a correspondingly infinite variety of data trees. New element identifiers and new tree structures are constantly created.
The widespread need to generate unique transforms to convert XML data into another format presents a problem that is exacerbated by the difficulty and tedium of the task. As one might imagine, writing transforms involves writing code that identifies every element in an XML file, and then re-identifies and reconfigures those elements into a format that is recognizable to a different schema or application. Such code must be properly written according to the need and the confines of the transformation language. Transforms of one XML document into another XML document, or some other data format, are generally performed using Extensible Stylesheet Language Transformations (“XSLT”).
XSLT is an XML language that specifies rules to execute to transform one XML document to another. An XSLT document is called a stylesheet or transform (the terms are considered equivalent). An XSLT stylesheet contains templates. An XSLT processor compares the elements in an input XML document to the templates in a stylesheet. When it finds a matching template, it writes the template's contents into an output or resultant tree. When it completes this step, it may serialize the output tree further into an XML document or another format, such as plain text or HTML.
FIG. 1 displays the complete process of transforming a source file 101 into a new file 104. First, a developer 100 identifies the elements of the source file 101, and how those elements should be converted into elements of a new file 104. Next the developer 100 writes a transform 102, in the form of an XSLT stylesheet (discussed below). The transform 102 contains many general features that will be the same or similar for the vast majority of XSLT transforms. It also contains identifiers for information, such as namespaces, that will be used with either the source file 101, the transform 102 or the new file 104. Finally, it contains descriptions of particular types of source file data, and instructions for how to modify source file data to fit into a new file 104 format. A completed transform 102 can be used in an XSLT processor 103 to convert a source file 101 into a new file 104.
An XSLT processor 103 is a software program that reads an XSLT stylesheet such as transform 102, reads an input XML document such as source file 101, and converts the input document into an output document such as new file 104 according to the instructions given in the stylesheet. An XSLT processor 103 can be built into a web browser, it can be built into a web application or server, or it can be a standalone program run from the command line or a Graphic User Interface for an application.
The process of creating an XSLT stylesheet will usually involve writing an XML declaration for the stylesheet, which is a header section that describes the transform and specifies some required information, such as namespaces to be referenced in the transform. It will also involve writing a number of transform templates. Each template has a “match” attribute that contains a pattern identifying input that the template matches. The simplest such pattern is an element name. Thus, this template says that every time a person element is seen, the stylesheet processor should emit the text “A person”:<xsl:template match=“person”>A person</xsl:template>
The following is a complete stylesheet that uses the above template:
<?xml version = “1.0”?><xsl:stylesheet version = “1.0” xmlns:xsl =http://www.w3.org/1999/XSL/Transform> < xsl:template match = “person”>A person</xsl:template></xsl:stylesheet>
As can be appreciated from the very simple stylesheet above, there may be considerable difficulty and tedium in manually writing XSLT transforms, especially when transforms are for longer and more complex XML files. Moreover, many of the operations in generating such a stylesheet are repetitive: this refers to much of the information in the header, and the syntax within and surrounding the templates. In light of the pervasive need to transform XML data, the tedium and repetition involved in the task of writing XML transforms, and the lack of availability of a single transform that can be used with all XML data, there is an as-yet unaddressed need in the industry to improve techniques for creating XML transforms.