Techniques are known in which data is searched for the longest matching character string, and the data is compressed according to appearance frequencies in the data. For example, ZIP is a compression technique of performing LZ77-type compression at a first stage and Huffman compression at a second stage.
In the LZ77-type compression performed at the first stage, a sliding window is applied to the data to be transformed, and the longest matching character string search is performed in the sliding window. Then, in the LZ77-type compression, an identification bit for a character or a character string is given to a compression code according to the result of the longest matching character string search, and compression coding is performed. In the following description, data obtained by the longest matching character string search will be expressed as longest matching data.
For example, if the result of the longest matching character string search in the sliding window gives the longest matching data of less than 3 bytes, a code in which an identification bit “0” is associated with a 1-byte character code is output as a compression code. If, in contrast, the longest matching data is 3 bytes or more, a code in which an identification bit “1” is associated with the position and the length of the sliding window of the longest matching data is output as a compression code.
In the Huffman compression performed at the second stage, a Huffman tree according to appearance frequencies of characters is generated, and a Huffman code is assigned to each 1-byte character code associated with the identification bit “0”. The length of the longest matching character string is assigned as a compression code to each 1-byte character code associated with the identification bit “1”. These related-art examples are described, for example, in Japanese National Publication of International Patent Application No. 2004-514366 and Japanese Laid-open Patent Publication No. 08-288861.
However, in the conventional technique described above, the compression code includes the identification bit, resulting in the problem that a compression rate decreases.