The present invention relates to a word recognition apparatus which recognizes an input word by comparing it with dictionary words prestored in a word dictionary memory. The input word may be extracted from a group of words printed on a document, the words being extracted by means of scanning or the like.
In this kind of word recognition apparatus, the input word, which is composed of a letter string, is compared with a great number of dictionary words in a dictionary memory. The dictionary word with the highest degree of resemblance to the input word is considered as a "recognition result."
In this case, it is not realistic to compare the input word with all of the dictionary words in the dictionary memory because this requires too much recognition time. Hence, the dictionary words which are to be compared must be restricted in number. In one prior art method, the recognition time has been restricted by reducing the number of dictionary words to be compared with the input word in accordance with the number of letters included in the input word (word length). For example, the comparison has been carried out with only dictionary words having (n-2), (n-1), n, (n+1) and (n+2) of the word length in relationship to a length n of the input word. Another prior art method has performed the restriction of time by selecting, at first, the dictionary words that have the same one head letter or the same two head letters as the input word.
The prior art, however, has the following disadvantages:
1. In case of prior art using the word length as a key:
If it is assumed that input words have the word length of two to twenty-five letters, the dictionary words can be grouped into twenty-four groups. However, in this case, a large problem arises by the distribution of word lengths of the dictionary words. For example, in the case of making a dictionary of city names, the word lengths of the dictionary words take a normal distribution with a peak at eight to ten letters. As a result, the number of the dictionary words have greatly varied word-length groups. It may occur that more than one-half of the dictionary words belong to only five word-length groups, centered around the word-length group at the peak in number. In such a case, if a DP matching method (as described in U.S. Pat. No. 4,418,423, for example) is utilized to make word-comparison, a great deal of time is necessary to recognize an input word. Accordingly, a shortening of the recognition time cannot be sufficiently achieved.
2. In case of prior art using the identification of the one or two head letters:
When the dictionary words are grouped on the basis of one head alphabet letter, twenty-six groups are maximally composed. In comparison with the recognition technique using the word length, a number of the dictionary words in each group is dispersed according to letter usage and does not take a normal distribution. However, there is a great problem that, if the head letter of the input word is not read, a correct recognition cannot be accomplished. Then, it is necessary to perform a comparison with all of the dictionary words. When the dictionary words are grouped on the basis of head two letters, 676 (26.times.26) groups are composed in the maximum number. Therefore, the dictionary words are much more dispersed. In this case, when the first head letter is not read, it is possible to access a certain dictionary word in the dictionary memory by using a second letter as a key. However, in this case, if the second head letter is also not read, it is necessary to make comparison with all of the dictionary words. Thus, the comparison-processing time depends upon the read-out result of the two head letters of the input word. If a character-reader makes an error, the input word cannot be compared with a correct group of the dictionary words.