An example of hierarchical text data is an XML document. XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a “root” or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. An attribute provides more information about elements, often giving information that is not part of the data itself.
Work has previously been done on the compression of XML documents: see for example H. Liefke and D. Sciciu, An efficient compressor for XML data, in Proc. 2000 ACM SIGMOD Conference, pages 153-164, 2000. This work describes a tool, called XMill, for compressing XML data. XMill incorporates various existing and user defined compressors to enable data type specific compressors, e.g., numbers, strings, enumerated types.
One known compression approach also has considered enabling queries on a subset of compressed documents: P. Tolani and J. R. Haritsa, A query-friendly XML compressor, Proc. 18th IEEE International Conference on Data Engineering, pages 225-234, 2002. But while this approach enables querying of the compressed document, but there is no notion of encryption (thus, no decryption either).
There is also a recent work that delves into the problems of compression, navigation and searching of XML documents: P. Ferragina, F. Luccio, G. Manzini and S. Muthukrishnan, Compressing and Searching XML Data Via Two Zips, Proc. Of World Wide Web Conference, 2006. However, there is again no notion of encryption (and thus no decryption either).