The present invention relates to a text retrieval method and a text retrieval apparatus for extracting texts including a keyword string designated by an end user from a text database, and more particularly to a text retrieval method and a text retrieval apparatus when a keyword string is inputted using an optical character reader.
With the improvement of a processing speed of a computer, it has become possible to perform full text retrieval or keyword matching for voluminous texts and to extract a text including a keyword string designated by an end user at a high speed. As a typical system, a full text search system described in the Transactions of the 45th National Convention of the Information Processing Society of Japan (3)3-239 to 244, a full text database system described in the Technical Research Report of the Institute of Electronics, Information and Communication Engineers DE90-34 and so on may be mentioned. As a mode for instructing text retrieval, there are a command mode in which keyword strings are enumerated as arguments, a mode of describing retrieval instruction contents in a natural language statement and so on, but it is a mode of retrieving a keyword string as a clue finally in either case.
On the other hand, an input method of putting a character string composed of printing types or handwritten characters into a computer by pattern recognition in place of keyboard input has been put to practical use due to the development of character recognition processing technique. However, a character recognition rate is not 100% in general, but the performance of character recognition is deteriorated, in particular, when the shapes of characters bear a close resemblance to each other (for example, "" in a Chinese character and "" in a katakana character) and when one character is recognized as two characters (for example, "" in a Chinese character is recognized as "" and ""). This problem is also applied to character recognition of a keyword string as a matter of course. The above-mentioned problem is also applied to a case when a keyword string is composed of the alphabet. For example, "IDOL" is recognized as "IOOL" or "JDOL", and "WIDE" is recognized as "VVIDE" sometimes.
In conventional character recognition technique, functions to present a recognized result to an end user to have the end user recognize it and to correct the result to another proposed recognized character as occasion demands become indispensable in order to amend an error in recognition as described above. However, the identifying work for ascertaining the difference between, for example, "" in a Chinese character, "" in a katakana character and a symbol "", "" in a hiragana character and "" in a katakana character, and "" in a hiragana character and "" in a katakana character is a heavy burden to the end user.
On the other hand, when text retrieval is made while "" including a Chinese character is misconceived as "" including a katakana character, it is a matter of course that none of desired texts is included in the retrieved result. Therefore, the more frequently errors of character recognition are generated, the more frequently the oversight in retrieval occurs. In a text retrieval system, it is possible to reduce noises included in the retrieved result (surplus texts included in the extracted result) to an appropriate quantity by narrowing-down retrieval or the like, but it is a heavy burden for the end user to reextract a text which could not be extracted in case of the oversight in retrieval (a state that a text to be extracted is not extracted) conversely to the above. Accordingly, it is important to adopt a processing system for reducing the oversight to the utmost even if noises are increased more or less.
Further, since texts which are possible to meet the requirement of the end user are extracted at random in the conventional text retrieval technique, no such processing as to further apply sequencing to the retrieved text is performed.