Structural information may be defined by a document written in a markup language, such as extensible markup language (XML). XML specifies both structure and content of a document, at the expense of significant redundancy. The simplicity and the ease of representing data in XML has resulted in an increasing move to utilize XML across different domains, ranging from databases to web applications. XML documents tend to be quite large compared to other forms of data representation, which reduces the performance of such documents, such as increased transmission requirements. Therefore, it is desirable that an efficient XML document encoding technique be developed to minimize data storage requirements and improve processing performance.
One approach taken by existing techniques involves restructuring the document and performing generic text compression on the restructured document. Generally, this approach yields superior results when compared with performing text compression directly on the original XML document. Other techniques utilize the XML Schema (XSD) to create a ‘map’ of the XML document, which allows references to the elements and attributes to be replaced with codes that significantly reduce the size of the document, and may be further assisted by subsequently performing generic text compression on the mapped XML document.
A disadvantage with the above techniques is that they do not effectively utilize the inherently well-defined structure of the XML document to provide more effective compression. Whilst some of the techniques do make an effort to use the structural details, this only occurs as a pre-processing exercise before the next level of, usually, text-based compression. Therefore, a need exists to provide an algorithm to overcome these problems and utilize the XML structure to improve the compression ratio.