The Extensible Markup Language (XML) is a standard for encoding textual information that has been recommended by the World Wide Web Consortium (W3C). Likewise, the Standard Generalized Markup Language (SGML) is an international standard (ISO 8879) meta-language that predates XML and is an ancestor to XML. SGML is a language for describing a document structure. XML is a simplification of SGML that is easier to use. For a discussion of the XML and SGML standards, see, for example, Extensible Markup Language (XML) 1.0 W3C Recommendation, http://www.w3.org/TR/1998/REC-xml-19980210; and http://www.w3org/markup/SGML/overview.html, respectively, each incorporated by reference herein.
The illustrative XML standard allows XML-enabled applications to inter-operate with other compliant systems for the exchange of encoded information. XML documents store textual data in a hierarchical tree structure. Each XML document has one root node, often referred to as the root element, with the other nodes in the hierarchical tree being arranged as descendants of the root node. Each XML document contains two types of elements, namely, data elements and the corresponding tag elements that impose the hierarchical structure on the data elements.
Since XML documents contain only textual information, the documents can be quite large in size. In order to reduce the size of XML documents for transmission and storage, standard compression algorithms suitable for textual information have been applied to entire XML documents. While the application of such standard compression techniques to entire XML documents has been an effective technique for reducing the overall size of such XML documents, this technique suffers from a number of limitations, which if overcome, could greatly expand the efficiency and usefulness of the compressed XML documents. Specifically, the compressed XML documents generated by such conventional XML compression techniques must be decompressed to be useful. A need therefore exists for a method and apparatus that compresses XML documents in a manner that allows the document to be processed in a compressed form.