Recently, communications between computer systems for data and information exchange has been significantly developing thanks to the Internet, which is known to have rapidly widespread on a global level by virtue of being supported by public communication networks, both traditional and technologically advanced ones, such as the ISDN, the ADSL, the GPRS, and others.
Success of this phenomenon is indeed due, also, to the availability, in real time and cheaply, of information and data stored on servers located all over the globe and connected through dedicated digital lines to computers reachable through the various last mile network access services.
Most of the electronic texts available from the World Wide Web are formatted according to the Hyper Text Markup Language (HTML) standard. Unlike other electronic texts, HTML ‘source’ documents, from which content text is displayed, contain embedded textual tags. HTML is designed to display the data and to focus on how data looks. However, since HTML presents several drawbacks, in particular the inevitable pre-definition of tags, extensible Markup Language (XML) has been created by the World Wide Web Consortium (W3C). XML is designed to describe data and to focus on what data is. Like HTML, XML is based on Standard Generalized Markup Language (SGML). Although SGML has been used in the publishing industry for decades, its perceived complexity intimidates many people that otherwise might have used it.
Using XML, a meaning may be assigned to each tag of the document so that it is easy for a machine to process the information. For example, a postal code may be easily extracted from a document by simply locating the content surrounded by special tags that could be <postal-code> and </postal-code>, technically referred to as the <postal-code> element.
There are three common terms used to describe parts of an XML document, tags, elements, and attributes:                a tag is a string of characters comprised between the left angle bracket ‘<’ and the right angle bracket ‘>’. There are starting tags and ending tags, an ending tag corresponding to the starting tag wherein a ‘/’ is inserted between the left angle bracket and the text.        an element corresponds to the starting tag, the ending tag and everything in between.        an attribute is a name and/or an associated value included inside a tag of an element.        
For example, considering the following XML document,
<address><name><title>Mrs.</title><first-name>Mary</first-name><last-name>McGoon</last-name></name><street>1401 Main Street</street><city state=‘NC’>AnyTown</city><postal-code>34829</postal-code></address>
tags <name> and </name> represent a starting tag and an ending tag, respectively, the <name> element contains three child elements <title>, <first-name> and <last-name> and state is an attribute of the <city> element.
Since XML is designed to describe data, it simplifies data interchange and enables smart code wherein important information may be easily identified.
U.S. Pat. No. 6,480,865 describes a method for annoting XML documents with dynamic functionality. The dynamic functionality comprises invocations of Java objects. These annotations belong to a different name space, and thus a Dynamic XML-Java (DXMLJ) processor recognizes elements within the XML document that are tagged with DXMLJ prefix tags, processes each of these tags, and transforms the XML document accordingly.
For handling XML documents, software applications are generally using an XML parser that may be considered as an interface between the document and the software. In such case, the XML parser extracts data from the XML documents to build its internal tree representation so as to provide the software applications with the required data. A parser used with XML document is qualified as a Document Object Model (DOM) parser or Simple API for XML (SAX) parser.
DOM parser creates a DOM tree in memory for an XML document. A DOM parser is usually used to manipulate the document, traverse it back and forth. However, since the main drawback of DOM parser is to be memory consuming, it is reserved to handle small documents.
SAX parser is an event-based driven interface and invokes callback methods when a tag is encountered. It is mainly used when no structural modification is planned and can handle huge documents.
These parsers may be validating or non validating parser. A validating parser checks the XML file against the rule imposed by the Document Type Definition (DTD) while a non validating parser does not validate the XML file against a DTD. Both validating and non validating parsers check for the well formedness of the XML document. A DTD specifies constraints on the valid tags sequences that can be in the document.
Therefore, since the use of XML document type is increasing dramatically over the Internet, as are the occurrences of large data objects in the contents to be transmitted, there is a need for optimizing the parsing of such document so as to improve the processing time of software applications handing this electronic document format.