1. Field of the Invention
The present invention relates to a document transformation system for transforming a structured document.
2. Related Background Art
Structured documents including XML (Extensible Markup Language) documents require description of structure information and character string information and thus generally have the feature of large data size (high data volume). For this reason, attention is recently being focused on technologies of compressing the structured documents, for purposes of effective utilization of a disk space for storage of structured documents and effective utilization of network resources.
Particularly, it is believed that the technology of separating a structured document into structure information and character string information and compressing them focuses attention on the structure information to decrease the number of occurring patterns, as compared with cases also including the character string information, and is thus able to achieve a higher compression rate than the compression by such technologies as LZ77 for simply compressing the structured document (reference should be made as to the details of LZ77 to “Jacob Ziv, Abraham Lempel: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3): 337-343 (1977)”).
Japanese Patent Application Laid-Open No. 2003-44459 (hereinafter referred to as Document 1) discloses a method of separating internal representation data of structured data into structure information and character string information by using preliminarily given structure specification information and compressing them independently of each other. This compression method compresses the data with focus on the structure information to achieve improvement in the compression rate. In addition, a structured document is subjected to validation check and parsing of structure information as pre-processings. It is described that, because of the independent compression of the structure information and character string information, the validation check and parsing can be carried out by decompressing (or restoring) the structure information only, without decompressing (or restoring) the character string information.
Document “Hartmut Liefke and Dan Suciu.: “XMill: An Efficient Compressor for XML Data”, In proceedings of ACM SIGMOD International Conference on Management of Data, 2000” (hereinafter referred to as Document 2) discloses a method of compressing an XML document by reusing partial structures appearing in the XML document. In this compression method, a structured document is separated into three information items: structure information, node type information, and text information, and each of them is compressed by an ordinary compression algorithm such as LZ77. According to this Document 2, since the compression is carried out with focus on the structure information, as in the case of the technology described in aforementioned Document 1, the compression rate can be raised. As described above, when the structured document is separated into the structure information and character string information to be independently compressed as in the conventional technologies, the compression rate is raised and the validation check and parsing can be carried out without decompression of the character string information.