As XML (extensible Markup Language) has become more widely accepted, increasing amounts of XML data have been generated and employed to store an ever-increasing variety of data. With such a variety of data being generated, a correspondingly wide variety of presentation formats have been employed to view the XML data and a correspondingly wide variety of uses have been found for such XML data. XML is a W3C (World Wide Web Consortium) endorsed standard for document marking that provides a generic syntax to mark up data with human-readable tags. Since XML does not have a fixed set of tags and elements, but rather allows users to define such tags, (so long as they conform to XML syntax), XML can be considered a meta-markup language for text documents.
Data is stored in XML documents as strings of text that are surrounded by text markup. A particular unit of data and markup is conventionally referred to as an element. XML defines the syntax for the markup. A simple XML document appears below:
<?xml version=“1.0”?><programmer grade=“G7”><firstname> ashton </firstname><lastname> annie </lastname><language> C </language><language> C# </language></programmer>
In this document, the name “ashton” is data (a.k.a. content), and the tags <firstname> and </firstname> are markup associated with that content. The example document is text and can be edited by conventional text editors and stored in locations including, but not limited to, a text file, a collection of text files, a database record and in memory.
XML documents can be treated as trees comprising a root node and one or more leaf nodes. In the example document, the root element is the programmer element. Furthermore, elements can contain parent elements and child elements. In the example document, the programmer element is a parent element that has four child elements: a firstname element, a lastname element, and two language elements. In the example document, the programmer element also has an attribute “grade”. An attribute is a name/value pair that is associated with the start tag of an element. XML documents can contain XML entities including elements, tags, character data, attributes, entity references, CDATA sections, comments, processing instructions, and so on.
The W3C has codified XML's abstract data model in a specification called the XML Information Set (Infoset). The Infoset describes the logical structure of an XML document in terms of nodes (a.k.a. “information items”) that have properties. Nodes in an XML tree have well-defined sets of properties that can be exposed. For example, an element node has properties including, but not limited to, a namespace name, a local name, a prefix, an unordered set of attributes, and an order list of children. The abstract description of an XML document standardizes information that is made available concerning XML documents. Thus, in addition to data that may be stored in an XML node, metadata concerning the node and the tree in which the node resides is available.
Programs that try to understand the contents of document like the sample XML document employ an XML parser to separate the document into individual XML tokens, elements, attributes and so on. As the document is parsed, it can be checked to determine whether it is well-formed (conforms to the XML specification) and to determine whether it is valid (conforms to a desired DTD (Document Type Definition) and/or schema). A DTD includes a list of elements, attributes and entities that an XML document can employ and the contexts in which they may be employed. XML schemas are scheduled to replace DTDs as an approved W3C standard and thus, in this document, when reference is made to a DTD, an XML schema should also be considered. Thus, a DTD (and/or XML schema) facilitates limiting the form of an XML document. A DTD (and/or XML schema) can be located within an XML document, or an external reference to the DTD (and/or XML schema) can be employed to locate the DTD (and/or XML schema) with which an XML document is related. External references are common since it may be desirable to have more than one XML document conform to one DTD (and/or XML schema).
With XML being employed to store data for such a variety of applications, transforming XML from one format to another format is common. While the markup in an XML document can describe the structure of the document, the XML markup typically does not describe how the document is to be presented. Thus the Extensible Stylesheet Language (XSL) was developed. XSL has subsequently been divided into XSL Transformations (Xslt) and other components.
Xslt is a general-purpose language employed to facilitate transforming an XML document from one form to another form (e.g., from XML to XHTML, XSL-FO, PostScript, RTF, etc.). Xslt employs the XPath syntax to identify matching elements. XPath is a query language for XML that facilitates selecting XML nodes from an XML tree. Conventionally, data is not stored in a manner that facilitates XPath querying. XPath can be employed to locate nodes by identifiers including position, relative position, type, content and the like. Thus, XPath can be employed to pick nodes and/or sets of nodes out of an XML node tree. There are at least seven types of nodes in an XML document that XPath addresses. These node types include a root node type, an element node type, an attribute node type, a text node type, a comment node type, a processing instruction node type and a namespace node type.
Conventionally, transformers depended on an XML document being fully loaded into memory before transformation. Furthermore, conventional transformers typically converted then wrote the entire transformed output before returning control to the requesting user. For example, transforming XML data from one format to another format has conventionally been achieved by copying an XML document into a node tree (e.g., DOM (Document Object Model)), pushing one hundred percent of the node tree into a transformer that transforms one hundred percent of the node tree and then pushes the entire transformed node tree to the output destination that desired the transformed file. Such all or nothing models suffer from several drawbacks, including, but not limited to, extra copy steps, the requirement to produce a node tree before transformation can be performed, transforming unneeded data, consuming excessive memory, consuming excessive processor cycles and limiting the flexibility with which the output destination can request transformations.
Xslt is an XML application that determines, via a set of rules, how one XML document should be transformed into another XML document. An Xslt document (e.g., an Xslt style-sheet) contains a list of templates that are employed in node matching. An Xslt processor can be employed to read the Xslt document and the XML document, and when a pattern match occurs between the input data and the stored template the output associated with the template is pushed out of the Xslt processor. The output can be, for example, written into an output tree (e.g., DOM). Thus, conventional Xslt processors typically interact with event driven user programs that receive event notifications from the Xslt processor along with a set of data concerning the event. One drawback with such conventional systems is that such event notifications may require unnecessary processing by a user program that may only be interested in a subset of events. Furthermore, user programs that interact with such event producing Xslt processors may be required to maintain complicated state machines in order to interact with the conventional Xslt processor.