There is known a technology that generates index information indicating, when compressing a plurality of files, which of the files includes predetermined character information (for example, see Patent Literature 1). The index information is used as an index indicating whether or not each of the plurality of files includes character information to be retrieved. The character information means character strings in which, for example, one-gram character codes are concatenated.
On the other hand, there is known a technology that generates pointer table-type index information associated with words (for example, see Non-Patent Literature 1). This technology will be explained with reference to FIG. 1. FIG. 1 is a diagram illustrating a reference example of a pointer table-type index generating process. As illustrated in FIG. 1, this technology extracts words from each document file, generates index information associated with a corresponding document ID, word IDs, and appearance positions thereof, collects pieces of the index information, and sorts the collected pieces of index information on the basis of the word IDs. Thus, the index information is converted into a transposition index that associates the document IDs and the appearance positions with each other on the basis of the word IDs.    Patent Literature 1: W/O 2013/038527    Patent Literature 2: Japanese Laid-open Patent Publication No. 10-261969    Patent Literature 3: Japanese Laid-open Patent Publication No. 08-030633    Patent Literature 4: Japanese Laid-open Patent Publication No. 10-240754    Non-Patent Literature 1: NISHIDA KESUKE: “Google wo Sasaeru Gijutsu”, Apr. 25, 2008, KUBAUHIKI KAISHA GIJUTSU HYOURONSHA    Non-Patent Literature 2: SEKIGUCHI KOJI: “ApacheLucene Nyumon”, Jun. 25, 2006, KUBAUHIKI KAISHA GIJUTSU HYOURONSHA
However, there exists a problem that index information indicating which of the plurality of files includes a predetermined word is not able to be easily updated in accordance with the update of any file.
For example, the index information generated by the technology is index information about character information, which generates the index information indicating which of the plurality of files includes predetermined character information, and basically is not index information about words. Because a basic part of the index is compressed but an update part to be added in accordance with the update of any file is not compressed, maintenance of regions is needed in accordance with the enlargement of an index size. Therefore, this technology is not able to easily update, in accordance with the update of any file, the index information indicating which of the plurality of files includes a predetermined word.
On the other hand, in a conventional technology that generates a pointer table-type index information that is associated with words, words included in one document file differ from words included in another document file. As a result, when any document file is updated, a new word or an unknown word can be included in the document file in some cases, and thus a generating process, a collection process, a sort process, and a transposition process of the index are repeated again. Therefore, this conventional technology is not able to easily update the pointer table-type index based on word IDs of words included in the plurality of document files.