When data is transmitted on a network or stored in a storage device, the data may be compressed by using a compression code to reduce the data amount. For example, according to a file format called ZIP, a Huffman code may be used as a compression code. Regarding the Huffman code, from among the symbols included in the original data (for example, a bit string of one character or one byte) a code of which the number of bits is small is assigned to a symbol that has a high appearance frequency and a code of which the number of bits is large is assigned to a symbol with a lower appearance frequency. To decompress the compressed data, dictionary data may be prepared to determine the original symbol (decoded symbol) based on the code.
A document retrieving method for text data that includes a specific character string (the number of characters may be one) from a document data group is used. For example, an electric book file that includes a key word specified by a user is retrieved from among a plurality of electric book files. In this case, when all the texts of the document data group are retrieved after a target character string is specified, a retrieval time may be long. Therefore, all the texts of the document data group are retrieved in advance, and an index may be generated to determine existence or non-existence of the character string. If the character string is specified, the retrieval time may be shortened by determining desired document data with reference to the index.
There is a proposal of a document retrieving method for retrieving a document by extracting a one-character component as a character included in each document and an adjacent character component indicating an adjacent character, generating a one-character component chart and an adjacent character component chart, and using the generated two charts. To decompress the compressed data, a compression decompression algorithm (dictionary algorithm) for sequentially decompressing a dictionary by using a decoded character string to decode the consequent code based on the decompressed dictionary. There is a proposal of a data compressing method for enabling keyword retrieval without decompressing a compressed file by generating a map group indicating whether each of the characters is included in the file as well as compressing the file. If the index is not added to the data, after obtaining the compressed data, an information processing device may generate an index to perform document retrieval of the data as a target. For example, after obtaining a compressed electric book file, an information terminal device may generate an index to retrieve a file that includes a keyword specified by a user from among the plurality of the electric book files.
For example, there is a method for generating an index. According to the method, after the decompression of the compressed data is completed, a flag corresponding to the symbol is searched from the index and then is updated for each symbol included in the decompressed data. However, according to the above-described method, the number of times of access to a storage device such as a Random Access Memory (RAM) in a process for generating an index is increased.