In the conventional encoding technology, there has been an encoding technique by using a static dictionary and a dynamic dictionary when encoding text data. For example, the static dictionary is a dictionary that associates a word and a static code having a code length according to the occurrence frequency of the word. The dynamic dictionary is a dictionary that associates a character string that occurs more than once in text data with a dynamic code. In the conventional encoding technique, as for a word for which a hit is found in the static dictionary, it is replaced with a static code in the static dictionary, and as for a character string occurring more than once for which no hit is found in the static dictionary, the character string is registered in the dynamic dictionary, and a dynamic code is assigned to the character string.
In the conventional encoding technique, when decoding encoded data, a dictionary to be referred is different between a static code and a dynamic code. Therefore, in the conventional encoding technique, an identification bit to identify which of the static dictionary and the dynamic dictionary is used to encode data is added at the top of the static code and the dynamic code (Japanese National Publication of International Patent Application No. 2004-514366 and Japanese Laid-open Patent Publication No. 08-288861).
However, in the conventional technique described above, there has been a problem that entropy encoding according to the occurrence frequency of a character string in text data is not performed.
For example, the code length of a static code corresponding to a word that is registered in the static dictionary is one according to the occurrence frequency of the corresponding word. However, if an identification bit is added, the code length becomes different from the one according to the occurrence frequency. Moreover, a dynamic code of the dynamic dictionary has a predetermined code length regardless of the occurrence frequency of a character string corresponding to a dynamic word, and the code length is not the one according to the occurrence frequency of the corresponding character string.