XML is quickly becoming a core technology for interoperability and data manipulation, and parsing XML has become a standard function in most computing environments. Two main approaches exist at the present time. One of these approaches is the Simple API for XML Processing, or “SAX”. The other approach is the Document Object Model, or “DOM”. Each of these approaches has certain benefits and drawbacks, although SAX presently has more momentum as an XML processing API. Efficient XML processing can be fundamental to a server. As more documents become XML based, more traffic on the server will be XML. The latest push into web services currently utilizes the Simple Object Access Protocol (SOAP) as a transport. SOAP is a lightweight, XML-based protocol for exchanging information in a decentralized, distributed environment. This push has highlighted the need for fast, solid XML processing. Web services can utilize XML over HTTP as the transport for remote procedure calls. If the XML parser is slow, these calls cannot be made in a timely manner.
To use SAX, one writes handlers, or objects that implement the various handler APIs, which receive callbacks during the processing of an XML document. Some of the main benefits of this style of XML document processing include efficiency, flexibility, and the fact that this approach is relatively low level. It is possible to change handlers during the processing of an XML document, which allows one to use different handlers for different sections within the same document. One drawback to the SAX API is that the programmer must keep track of the current state of the document in the code each time one processes an XML document. This can be an unacceptable amount of overhead for XML processing, and can lead to convoluted document processing code.
DOM, on the other hand, loads an entire XML document into memory and provides APIs to the programmer to manipulate the DOM tree. At first glance, this might seem like a win for the application developer as the developer does not have to write specific parsing code. Unfortunately, this simplicity can take a very serious hit on performance. An entire document must be read into memory, so for very large documents one must read the entire document into memory before taking appropriate actions based on the data. DOM is also restrictive in the way in which it loads data into memory. A programmer must use the DOM tree as the base for handling XML in the document. This can be too restrictive for many application needs. For example, most application server deployment descriptors need to be bound to specific Java classes and not DOM trees.
A streaming API for XML parsing can be implemented on top of SAX, as is described in U.S. Provisional Patent application No. 60/362,773 entitled “Streaming Parser API” by Chris Fry et al. The streaming parser takes SAX events and constructs an easily manipulated event stream that is available to the application programmer. The streaming API gives parsing control to the programmer by exposing a simple iterator-based API to the programmer. This allows the programmer to ask for the next event, or pull the event, rather than handling the event in a callback. This can give the programmer more procedural control over the processing of the XML document. The streaming API can also allow the programmer to stop processing the document, skip ahead to sections of the document, and get subsections of the document as mini DOM trees.