There is a technique of tallying appearance frequencies of characters in data, and then, performing longest-match string searching, and compressing the data in accordance with a result thereof. For example, ZIP is a compression technique of performing tally of appearance frequencies of characters and generation of a Huffman tree at a first stage and performing LZ77 compression and Huffman compression at a second stage.
In the LZ77-type compression, longest-matching searching is performed on a sliding window, an identification bit is given to a compression code in accordance with a result thereof, and Huffman coding is performed on a character or the position and the length of the character string using the Huffman compression. In the following description, a character string provided by the longest-match string searching is expressed as longest matching data.
For example, as a result of the longest-match string searching on the sliding window, when the longest matching data is smaller than 3 bytes, a code in which an identification bit “0” and binary expression of a 1-byte character code are related is output as a variable length compression code using the Huffman tree. On the other hand, when the longest matching data is equal to or larger than 3 bytes, a code in which an identification bit “1” and the position and the length of the longest matching data are related is output as a variable length compression code in the same manner.
Japanese Laid-open Patent Publication No. 05-241777 discloses a related technique, for example.
Conventionally, in tally of an appearance frequency of a character using a sliding window and generation of a Huffman tree, an appearance frequency of the length of the longest-match string is not tallied. Accordingly, an appropriate compression code is not assigned to the length of the longest-match string in accordance with the appearance frequency thereof, resulting in the problem that a compression rate lowers.