In recent years, there have been conventional techniques of encoding a compression target file using a static dictionary. The static dictionary is a dictionary in which appearance frequencies of words and characters appearing in a document are specified based on common English-language dictionaries, Japanese-language dictionaries, textbooks, and the like, and shorter compression codes are assigned to words having higher appearance frequencies. In the conventional techniques, compression processing is performed based on the static dictionary by converting text in a compression target file into the compression codes assigned to the words and the characters in the static dictionary, so as to form a compressed file.
For example, Japanese Laid-open Patent Publication No. 08-288861, Japanese National Publication of International Patent Application No. 2004-514366, and Japanese Laid-open Patent Publication No. 06-222903 disclose the conventional techniques.
With the above-mentioned conventional techniques, however, lengths of compression codes assigned to words that are not registered in the static dictionary increase in data of the compressed file, which lowers a compression rate thereof.
For example, in the static dictionary that is used in the conventional techniques, words including personal names, names of places, and work names are not registered. The compression processing is therefore performed by separating these words into characters forming the words and converting them to compression codes assigned to the respective characters. In this case, a compression rate of a compression target file is lowered.