1. Field of the Invention
The present invention relates to a technique for encoding and decoding a compressed data file, and more specifically, to a technique capable of identifying the type of compression employed with the data file after compression.
2. Description of the Related Art
The extensible markup language (XML) is a type of markup languages for describing (marking up) a semantic structure of a document with simple marks. XML allows a user to carry out a user-original extension by defining a grammar and imparting logical senses to constituents of the document. Therefore, XML is expected as a data format for use in data exchange on the Internet.
XML involves a concept called a document type definition (DTD), and XML can judge as to whether a document is valid or not valid concerning a certain DTD. Specifically, for example, a grammatical rule is defined such that nodes <TITLE>, <AUTHOR> and <PUBLISHER> appear severally once in this order after a node <BOOK>. Then, it is possible to judge as to whether a certain XML document is valid or not, in other words, whether the XML document accords with the grammatical rule or not.
Incidentally, an XML document expresses a data structure universally by using certain marks (hereinafter referred to as “tags”) as described above. Accordingly, the XML document has a characteristic of a larger file size in comparison with other file formats having exclusive data structures.
In this regard, it is possible to reduce a file size of an XML document by compressing the XML document with a universal data compression technology. Since an XML document is basically a text-based flat data file, a high compression effect can be anticipated.
FIGS. 9A and 9B are views for describing a conventional procedure for processing an XML document with an XML parser. Here, the XML parser refers to software which converts the XML document into a format usable by an application program, and to a computer which executes the foregoing conversion processing.
FIG. 9A shows a procedure of a case in which the XML original document is input to the XML parser. As shown in the drawing, the XML parser 910 includes a decoder 911 and a parser 912. When the XML document is inputted, the decoder 911 of the XML parser 910 first converts the character code used in the inputted XML document into another character code used by an application (such as UTF-8 or UTF-16 in case of a Java application, for example). Thereafter, the parser 912 analyzes the XML document, converts the XML document into a data format used by the application such as a document object model (DOM) tree, and then outputs the converted XML document.
Meanwhile, FIG. 9B shows a procedure in which the XML document is input to the XML parser after decompressing (expanding) the compressed XML document. In this case, an operation of the XML parser 910 is similar to the case in FIG. 9A. However, the XML document is decompressed by use of a decompressing tool 920 in accordance with the type of compression used to compress the XML document (compression type) prior to inputting the XML document to the XML parser 910.
As described above, an XML document has a characteristic of a larger data size in comparison with other file formats having exclusive data structures. Accordingly, data compression of the XML document is preferred.
In general, when data exchange takes place or when data are stored into a database, a data file is compressed in order to improve transmission efficiency or to downsize the data file to be stored. For this reason, numerous data compression technologies universally applicable to various data formats have been disclosed to date. Accordingly, it is conceivable that any of those conventional data compression technologies may be also applied upon compression of an XML document.
However, when the above-described conventional universal data compression technology is used, a compression process is executed regardless of the data format of the XML document. Accordingly, it is impossible to identify as to whether or not a compressed data file is an XML document in a compressed state.
Moreover, when the compressed XML document is used by an application, two-step procedures are required as described above, namely, a pre-process of decompressing the XML document in accordance with the compression type thereof and a process to input the XML file to the XML parser. Therefore, processing becomes complicated.
In addition, since the decompressing tool (a program) is located ahead of the XML parser, it is not easy to introduce the conventional compression technology to an existing system which analyzes the XML document by the XML parser and utilize the XML document by a certain application.