In compression techniques, an LZ77 system, such as ZIP or the like, that performs a longest match character string search by using a sliding window is the mainstream. When an information processing apparatus compresses multiple files by using ZIP and combines the compressed files, the files are individually compressed by using parameters associated with the corresponding files. Consequently, when searching the file, which is obtained by compressing and combining the multiple files, for a character string, the information processing apparatus releases the combination, decompresses the individual files, and then performs a check. The index that is used to speed up a search of a character string is created, in units of files, in a step different from a compression step. Regarding the index, for example, a pointer type inverted index is known in which an address for each word included in text data is indexed for each file.
There is a known technology that divides a compression target file (data to be compressed) into multiple blocks, performs a compression process on each block, and creates compressed data. In this technology, a dictionary that is needed when encoding data stored in a block targeted for compression from among the multiple divided blocks is created based on the data that is to be compressed and that is stored in the divided block (for example, see Japanese Laid-open Patent Publication No. 2011-114546).
However, there is a problem in that, when searching the file, which is obtained by compressing and combining the multiple files, for a character string, it is unable to perform a high-speed search. Namely, with the related technology, when searching the file, which is obtained by compressing and combining the multiple files, for a character string, the combination is released, all of the individual files are decompressed from the top, and then decompressed character strings are checked against a search character string; therefore, it is unable to perform a high-speed search. Even if a search is performed by using the index, because the decompressed character string is checked against a search character string in units of files by using the index that is created in units of individual files, it is unable to perform a high-speed search.