A technology of searching a desired document among a large amount of computerized documents becomes more important.
When, for example, a document in Japanese having respective words not spaced therebetween is searched, a technique of employing an n-gram as a searching unit is applied.
An n-gram is n number of successive character strings. In a full-text searching based on an n-gram, a searching process of searched files is executed for the number of n-grams extracted from a searching character string. Hence, when a relatively long search character string is used, searching takes time.
Non-patent Literature (Yasushi OGAWA, Toni MATSUDA, “An Efficient Document Retrieval Method Using n-gram Indexing”, IEICE (The Institute of Electronics, Information and Communication Engineers) Journal (D-I), Vol. J82-D-I, No. 1, pp. 121-129, January 1999) discloses a document searching technique of obtaining a sum of document frequencies of n-grams as an estimation value of a process time, and of selecting an n-gram used for an actual searching process, thereby speeding up the searching process.
There is a desire for further speeding up of the searching process using an n-gram. Moreover, there is a demand for efficient document searching at a limited process speed and capacity like a compact electronic dictionary loaded in a cellular phone, a portable electronic apparatus, etc.
The present invention has been made in order to overcome the above-explained situation, and it is an object of the present invention to provide a searching apparatus and a searching method which are capable of searching a document including a specified character string more efficiently.