When carrying out a full text search, a conventional search device can generate a large volume of indexes with which a name which is a search target can be referred to in advance from partial character strings which constructs the name by using a lot of document data to carry out a partial match search on the large volume of indexes at a high speed. A word or a character N-gram is used as the smallest unit to be searched for. When words are defined as search units, while search results which are appropriate linguistically can be expected to be acquired, a search omission occurs when an error occurs in a prior language analysis. Further, such a conventional search device cannot carry out a search in units of an element shorter than a word. On the other hand, when character N-grams are defined as search units, while no search omission occurs, there is a possibility that candidates for the name each having delimiters which are not appropriate linguistically appear out of the blue. For example, although it is impossible to bring a word delimiter indicating “(toukyou)/(to)” (the slash shows a delimiter) into correspondence the following word: “(kyouto)”, it is possible to bring the following characters: “(kyou)/(to)” into correspondence with character delimiters indicating “(tou)/(kyou)/(to)” through a match partial search. Therefore, although “(kyouto)” is not included in search results when making a search for “(toukyoto)” on a word by word basis, “(kyouto)” is included in search results when making a search for “(toukyoto)” on a character by character basis, and this search result does not have linguistic validity.
In order to solve this problem, such a conventional search device is constructed in such a way as to be able to carry out a search on a word by word basis and on a character by character basis. A problem is, however, that the time required to generate indexes and the search time required to carry out a search are increased to the sum of the time required to generate indexes for a search on a word by word basis and the sum of the time required to generate indexes for a search on a character by character basis, and the sum of the search time required to carry out a search on a word by word basis and the search time required to carry out a search on a character by character basis. In contrast with this, an information search device disclosed in patent reference 1 adds word information to each index which is a character unit, and makes a search on a word by word basis and on a character by character basis in a complex manner, thereby suppressing an increase in the processing time.
Further, when setting a name which a user has uttered as a search term, and making a name search for names each of which partially matches an index, for example, a fuzzy match search technique is useful because the name uttered by the user does not always match an index partially. Patent references 2 and 3 propose fuzzy match search techniques using indexes for full text search. A character string search device disclosed by patent reference 2 divides a search term into character N-grams, searches through the indexes for a name which partially matches each of the character N-grams, and outputs the name including the largest number of character N-grams each of which partially matches the search term as a search result. Further, a text search device disclosed by patent reference 3 counts the number of characters in each index each of which appears at the same position as the same character in a search term or at a position falling with a predetermined range from the position of the same character to calculate the degree of similarity of each index, and outputs the name having the highest degree of similarity as a search result.