The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
XML Schema is one definition language that provides facilities for describing structure and constraining the contents of an XML document. A draft specification, referred to hereinafter as “XML Schema Specification”, for the XML Schema definition language is described in a set of three documents published by the W3C Consortium. The first document in the set is “XML Schema Part 0: Primer Second Edition”, W3C Working Draft 28 Oct. 2004, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein. The second document in the set is “XML Schema Part 1: Structures Second Edition”, W3C Working Draft 28 Oct. 2004, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein. The third document in the set is “XML Schema Part 2: Datatypes Second Edition”, W3C Working Draft 28 Oct. 2004, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein.
As referred to herein, an XML schema is a set of schema components that conforms to a definition language, such as, for example, the above-identified XML Schema Specification or any other proprietary or open-source Document Type Definition (DTD) language. A schema component is a block of data that provides a definition of an XML element or a portion of an XML element. Examples of schema components include, but are not limited to, schema components for type definitions, schema components for element declarations, and schema components for attribute declarations.
XML schemas are typically used for validation of XML documents. As used herein, validation refers to the process of determining whether a portion of an XML document (such as, for example, an entire XML document, an XML element included in an XML document, a sub-element of an XML element, or an attribute of an XML element) conforms to the definition and constraints specified in the relevant schema components of an XML schema. The validation of a specific portion of an XML document may return a validation result which, depending on the particular implementation, may comprise one or more values that indicate a successful or a failed validation outcome. In addition, the validation result may also comprise an overall validation outcome for a particular portion of an XML document that includes one or more sub-portions (e.g. for an XML element that includes sub-elements.)
Typically, the validation of an XML document against an XML schema is performed by one or more software components, collectively referred to as a schema validator. In one approach referred to as the Document Object Model (DOM) approach, a schema validator first builds in memory a DOM tree that represents the XML document. After building the DOM tree in memory, the schema validator traverses the DOM tree in a recursive descent fashion and validates the various portions of the XML document against the XML schema.
The DOM approach for validating XML documents, however, has several disadvantages. One disadvantage of the DOM approach is that it needs an entire XML document before the DOM tree can be built and the validation of the XML document can be started. This, however, makes the DOM approach unsuitable for use in conjunction with a StAX-based XML parser. Streaming API for XML (StAX) is generally an event-driven Application Programming Interface (API) that provides entities with handlers to request parsing events and other information as an XML document is parsed. The parsing events reported by a StAX parser may be any events that the parser encounters during the parsing of an XML document. Examples of such parsing events include, but are not limited to, start-element event, characters event, and end-element event. Thus, if a DOM approach is used to validate an XML document that is parsed by a StAX parser, all the benefits of parsing the document in an event-based fashion are lost.
Another disadvantage of the DOM approach for validating XML documents is that an in-memory DOM tree does not scale very well for large documents. The larger the size of an XML document, the more memory is required by the schema validator to validate the document. In some cases, it may not even be possible to validate large XML documents because the memory requirements for the DOM tree would be prohibitive.
Another disadvantage of the DOM approach is that it adversely affects the performance of the computer system on which the approach is implemented. As the memory used in validating an XML document increases proportionately to the size of the document, the cost of allocating memory for the DOM tree and the cost of traversing the tree significantly impedes the performance of the schema validator in particular, and of the computer system in general. Further, during the validation of an XML document, a schema validator implementing the DOM approach typically traverses a DOM tree multiple times, which impedes performance even further.
Based on the foregoing, there is a clear need for techniques that provide validation of XML documents that overcome the disadvantages of the DOM approach. In addition, there is a clear need for techniques that provide for streaming validation of XML documents against XML schemas that allow for defining a wide variety of XML structures and constraints, such as, for example, XML schemas that conform to the XML Schema Specification.