Conventionally, in dictionary content such as a dictionary and a dictionary of terminology, many entries are constituted by a phrase (multiple segments) such as : International Conference on Environment (: international/: environment/: conference)”. Moreover, in search processing, in addition to kanji, candidates are narrowed down by further input of multiple (two to three, for example) keyword character strings in hiragana and katakana, i.e., “kana” syllabic Japanese scripts, (for example, see Japanese Patent Laid-Open Publication No. 2005-158044).
The technique disclosed in the patent document above enables searches for a multi-segment entry, searches using a character string that includes kanji and kana, searches using variant characters, and searches for a loanword to be performed. Specifically, in addition to kanji, keywords in katakana/kana are recorded, and the keywords in kanji, kana, and the like are sorted according to character code. Furthermore, to speed up the search, a superiority index based on a head character is created to form a hierarchical index.
Moreover, with respect to katakana/kana keyword searches, processing to handle voiced consonants/semi-voiced consonants, contracted sounds/double consonants, long vowels is performed. Concerning character string searches including kana and kanji, processing to delete kana characters is performed and in variant character searches, orthographic conversion processing is performed. Further, for loanword searches, for example in the case of “violin” which is expressed as  or  conversion processing to search under  in the kana syllabary is performed.
However, the technique (sorted index for multi-segment entry searches) disclosed in the above patent document has a problem in that in search for an entry constituted by multiple segments, the number of hits is incorrect with respect to the limited number of candidates in a list.
For example, when a multi-segment keyword,  is searched for, if search keywords of  (horse)  (ear), and  (prayer) are input, and if a maximum of 10 retrieval candidates are held in a buffer of each search keyword, 10 items for , 10 items for , and 10 items for  are retrieved by a search performed in the order of character code.
However, if  is the eleventh or a subsequent candidate when searched according to the respective keywords, the entry of  will not be included among the 30 items retrieved. Consequently, the search result for candidates is indicated as 0 hits, i.e., the number of hits that is displayed is incorrect.
In addition, as the number of entries of a dictionary increases (for example, Kojien, Fifth Edition, has 230,000 entries), the number of items that meet respective search keywords increases, and consequently, the problem of the number of hits being incorrectly indicated as 0 occurs often.
Moreover, the number of scannings of the superiority index file based on a head character of a multi-segment entry increases corresponding to the number of search keywords. Therefore, as the number of search keywords increases, the more search time is required.
If the number of search keywords is decreased, the candidates are not sufficiently narrowed down, and the number of retrieval candidates increases. As a result, there is a problem in that a large memory capacity is required to store a longer list of retrieval candidates. On the other hand, if many search keywords are used, a large memory capacity to store the search keywords is required since a buffer is prepared for each of the search keywords.