A search apparatus is known, which searches for a target document from among a plurality of documents stored on Web or in a server using an N-gram index database (for example, refer to Yasushi Ogawa, “Pseudo-frequency method: a high-speed ranking search method for Japanese documents for n-gram indexing”, Institute of Electronics, Information and Communication Engineers Paper Magazine, Institute of Electronics, Information and Communication Engineers, October, 2000, Vol. J83-D-I, No. 10, pp. 1043-1054). The N-gram index database has a plurality of character strings that are combinations of N (an integer that is equals to or more than two) characters as indexing terms, and stores, for each of the indexing terms, a document including that indexing term. The search apparatus divides a search term into search strings each having the same number of characters as the indexing term, and, for each of the search strings, searches for a document including that search string using the N-gram index database. According to the search apparatus using such an N-gram index database, a similar word which fails in a exact match with the search term (for example, a character string differing from the search term by a single character) can also be retrieved, and therefore, omission in the search can be reduced.
Incidentally, when a search term includes a term constituting another word, the search apparatus using the N-gram indexing outputs a search result containing much noise. In addition, if the number of characters N of the indexing term is increased, the search apparatus using the N-gram index can improve a relevance factor of the search result, but patterns of the indexing term will be N-th power of available character types to be explosively increased, resulting in decrease in efficiency.