1. Field of the Invention
This invention relates to XML parsers, and particularly to a method that treats validation engines as an integral part of parsing by allowing the validation engines to be written in a recursive-descent code-driven manner.
2. Description of Background
XML (Extensible Markup Language) has begun to work its way into the business computing infrastructure and underlying protocols such as the Simple Object Access Protocol (SOAP) and Web services. In the performance-critical setting of business computing, however, the flexibility of XML becomes a liability due to the potentially significant performance penalty. XML processing is conceptually a multitiered task, an attribute it inherits from the multiple layers of specifications that govern its use including: XML, XML namespaces, XML Information Set (Infoset), and XML Schema. Traditional XML processor implementations reflect these specification layers directly. Bytes, read off the “wire” or from disk, are converted to some known form. Attribute values and end-of-line sequences are normalized. Namespace declarations and prefixes are resolved, and the tokens are then transformed into some representation of the document Infoset. The Infoset is optionally checked against an XML Schema grammar (XML schema, schema) for validity and rendered to the user through some interface, such as Simple API for XML (SAX) or Document Object Model (DOM) (API stands for application programming interface).
With the widespread adoption of SOAP and Web services, XML-based processing, and parsing of XML documents in particular, is becoming a performance-critical aspect of business computing. In such scenarios, XML is invariably constrained by XML parsing and validation by having the tokenizer drive the validation engine. In fact, most tokenizers parse the entire XML document by performing tokenizing with a DOM or SAX event stream and then run the validation engine over the stream of tokens or the DOM. However, technologies that treat validation as an integral part of parsing have not reached their full potential. Regardless of which manner of pushing the tokens is used, none of the current technologies allow the validation engine to be written in a recursive-descent code driven manner. As a result, this requires large tables, which increase the memory footprint, thus slowing processing efficiency. It also makes the validation code slower, and obscures the control flow of the whole parsing and validation processes.
Thus, it is well known that there are no existing technologies that treat validation as an integral part of parsing. Therefore, it is desired to integrate validation and parsing, and enable the writing of the validation engine in a recursive-descent code-driven manner.