A bitmap type full-text search technique is conventionally known that rapidly generates a full-text search index such as a character component table (see, for example, Japanese Laid-Open Patent Publication Nos. H1-181329, H3-174652, and H5-174064). Since morphological analysis is not performed in the conventional bitmap type full-text search techniques, the index can be rapidly generated and bitmaps can be compressed. A typical Japanese-language dictionary includes about 240,000 entries and is described with about 6000 to 8000 characters and therefore, has about 6000 to 8000 bitmaps for single characters.
However, the conventional techniques described above have a problem in that bitmaps of single characters lead to lower efficiency in the narrowing down of object items because hiragana, katakana, and alphabetic characters included in items (records) appear at higher frequencies.
If bitmaps of two-character strings are added, the volume of data increases and considerable memory is consumed. Volume reduction by hashing generates search noise. Therefore, the efficiency of narrowing down bitmaps drops, resulting in slower search speeds.
On the other hand, character strings forming words in alphabetic, hiragana, and katakana characters etc., generate search noise due to connection of characters. For example, the sentence “that is a pen”, which includes the English substantive verb “is” and indefinite article “a”, cannot be searched. Particularly, since the alphabetic character “a” appears at a very high frequency, if a search is performed for the indefinite article “a”, almost all the example sentences and words are searched as candidates.
In this regard, it is conceivable that a bitmap is generated for each basic word used as an entry word of a dictionary such as  and . However, this causes a problem in that no hit is retrieved by searching for character strings other than basic words. For example, when a search is performed for a character string  even if files including character strings , , and  are present, the corresponding files cannot be identified since the search is not performed for  or . As described above, so-called parting occurs between the end character  of the basic word  and the starting character  of the basic character .
Although candidate files can be identified by searching for basic words ending with  and basic words starting with  to cover all the mutual combinations of bitmaps, and the bitmaps can be read to perform an AND operation, not much can be expected in terms of the narrowing down of candidate files because of search noise generated by the hashing of the bitmaps and the time required for a series of processes such as searching among keywords.