The binary code for application programs is typically memory mapped by a memory manager of an operating system to use the actual contents of the binary image as the program's reserved region of address space. This allows a computing system to transfer an application for execution from disk storage to RAM at a page size that is defined and implemented by the memory manager which is optimized to perform I/O operations as fast as possible. Further, the size of the paging file is not oversized (e.g., the file on disk storage containing the virtual memory that is available to all of the processes running in the system remains small).
Extensible Markup Language (XML) is a flexible way to create a common information format and share both the format and data with any individual or group that wants to communicate and receive information in a consistent way. XML describes a standard for how to create a document structure and contains markup symbols to describe the contents of a page or file of application data. XML is “extensible” because the markup symbols of a document structure are unlimited and self-defining. An XML data file can be processed as data by a program, stored with similar data on another computer, or displayed.
During processing, an XML data file can be parsed into components of the data which can be validated as accurate and conforming to a particular XML specification, such as a schema definition for a particular application. An XML parser is typically an application program implemented to receive an input of XML data and break up the data into component parts that can then be processed and/or managed by other programming modules.
Two programming interfaces commonly implemented to parse XML data are the Document Object Model (DOM) and the Simple API for XML (SAX) interfaces which are widely used and incorporated into programming applications. DOM is a programming interface specification that defines a standard set of commands that an XML parser can utilize to access XML document content from a file. A DOM-based XML parser parses the XML data file into individual objects, such as the elements, attributes, and comments, and creates a representative binary tree structure of the document in memory so that each object is easily accessible and can be individually referenced and manipulated. Creating a tree structure for a large XML document, however, requires a significant amount of memory. For a large XML data file (e.g., 500 Mbytes and more), system performance may be degraded if there is not enough memory available to process the XML data file.
SAX is an application program interface (API) that can be utilized to parse an XML data file that describes a collection of data. SAX is a simpler interface and is an alternative to using DOM to interpret an XML file, but has fewer capabilities for manipulating the data content of an XML file. SAX is stream and event based, and generates events when encountering specific components in an XML document. SAX performs a forward only scan on a file to locate and handle an event associated with the data. As the parser evaluates the data, it communicates any elements or tags to the programming application that is interfacing with the parser.
An advantage of SAX over DOM is that it processes documents serially to read a section of an XML document, generate an event, and then continues on to read the next section. This form of serial document processing uses less memory than DOM. However, SAX does not validate the XML data against a schema as it parses the data. Further, SAX has to generate file reads (e.g., I/O commands) to read an XML document file which is then stored into a cache memory. This requires significantly more processing time to perform the I/O access reads. Because SAX is a forward only parser, it also does not provide random access to a document that is not loaded into memory. Rather, the data is handled in the order in which it is processed.
Each of the DOM and SAX-based XML parsers have drawbacks to their respective operation that render them inefficient to parse large XML data files. DOM provides faster access to the data, but for large XML data files, DOM takes up too much memory for the binary tree structure that represents the XML data file. SAX does not take up the memory space, but also does not provide quick access to the XML data in a file. SAX also does not validate the XML data when parsing an XML data file.
Accordingly, there is a need for application data processing that does not take up available system memory, allows for quick access to data in a data file, and provides that structured data, such as XML data, can be validated while being read from disk memory storage.