In conventional XML based application, a typical first step in any XML processing is to read in an XML document from disk (or the network) into memory. Most of the standards for conventional XML processing operate on an abstract model of the document in which the document is modeled as a set of nodes linked together with two fundamental, bidirectional relationships, parent/child, and previous-sibling/next-sibling. Traversal of these linkages to locate specific nodes, is accomplished by QName (i.e. get the next sibling named “foo”, or the first child named “bar”), as the conventional model is meant to be generalized for any XML vocabulary. Note that in most models, attributes are handled specially, and are not considered children—or siblings—because of their special, unordered semantics. The basic access pattern, however, remains the same. The W3C standard Document Object Model (DOM) provides a standard example of this model both in abstract, and in concrete implementation.
Traditionally, conventional processing of the Extensible Markup Language is based on a set of fairly general-purpose, off-the-shelf software components: a parser which understands XML syntax (and, often, applies basic data validation, following rules for this type of document as expressed in an XML Schema), an intermediate form for the XML dataset (such as a model accessed through the W3C DOM APIs, or a sequence of SAX events), and a serializer which will render the intermediate representation back into its XML-syntax equivalent. The actual application code, as is apparent to those of skill in the art, operates on the intermediate representation between its production by the parser and its delivery to the serializer.
The generality of such tools facilitates development, but has performance costs. For example, a parser which is designed for general-purpose use may spend a significant amount of time testing for input cases which are extremely unlikely to occur in a document conforming to a given schema.