According to compression algorithms that use variable-length compressed codes, such as Huffman coding and arithmetic compression, compressed data is generated by using compressed codes corresponding to character codes in compression target data. There is also a compression technique in which a compression algorithm such as Huffman coding is used to assign compressed codes each correlated with a word, which is a combination of character codes, and generate compressed data (see Patent Document 1, for example).
Patent Document 1: Japanese Laid-open Patent Publication No. 2010-93414
In certain languages (English and German, for example), space symbols included in character strings that constitute a document represent breaks between words, which are units constituting the document. In the above-described compression algorithms, while a single compressed code is allocated to each word including a plurality of characters, a compressed code is also assigned to each space symbol. Because a compressed code is assigned to each space symbol equally to words, the number of compressed codes to be used for compression increases, causing the compression ratio to decrease.