The compression technologies include the LZ77-based compression such as ZIP and the LZ78-based compression such as LZW as major technology. The LZ77-based compression executes a longest match character string search for a character string to be compressed by using a slide window, and assigns, to a compression code, the address and character string length of the longest matched character string.
As one example, the LZ78-based compression technology compresses words in original document data by using a static word dictionary in which normally-used words and phrases are preliminary registered in a hierarchical structure (trie structure), and an auxiliary dictionary in which character strings in the original document data that are not registered in the static word dictionary are registered (for example, see Japanese Laid-open Patent Publication No. 2000-269822).
However, there exists a problem in the conventional compression technology that a compression ratio is not improved in a case where a plurality of files exists in document data to be compressed.
For example, the LZ77-based compression technology executes compression on the basis of repeated byte patterns by using a slide window. In a case where a plurality of files exists in document data to be compressed, this slide window is not shared between the files. This is because a byte pattern that repeatedly appears in a specific file does not always similarly appear in another file. Therefore, in a case where a plurality of files exists in document data to be compressed, the LZ77-based compression technology individually executes compression for each of the files, and thus a compression ratio is not improved.
As one example, in a case where a plurality of files exists in document data to be compressed, the LZ78-based compression technology registers, for each of the files, words in an auxiliary dictionary, which are not registered in the static word dictionary, and executes encoding by using the registered auxiliary dictionary and the trie structure. When the number of the auxiliary dictionaries increases, the size of the dictionary also increases. When repeatedly-appearing words are registered in the auxiliary dictionary of each of the files, the size of the dictionary increases. Therefore, even in case of the LZ78-based compression, a compression ratio is not improved in a case where a plurality of files exists in document data to be compressed.