The present invention generally relates to a word recognition apparatus which recognizes a train of character outputs derived from a optical character reader (OCR) by comparing the train of character outputs with a plurality of pre-stored dictionary words. More particularly, the present invention is concerned with apparatus for determining the degree of disparity between a train of OCR character as common nouns.
Different from numerical characters showing a number, alphabetic characters representing a word bear significant mutual dependency within the word and often have sufficient redundancy. Therefore, by making recognition of a series of alphabetic characters on a word-by-word basis, the dependency and redundancy of the characters can be effectively utilized to enable correction of misread characters and reading of unidentifiable characters. This will realize a marked increase in the rate of a character recognition. Such word-by-word recognition will hereinafter be referred to as "word recognition" for convenience.
When an OCR is employed for recognition of alphabetic characters including upper case characters (e.g. capital letters) and lower case characters (e.g. small letters) inscribed on a mail or a document, it is conventional that the OCR produces two kinds of outputs regarding the characters on the mail or the document as upper case alphabetic characters and as lower case alphabetic characters, respectively, in order to achieve a higher accuracy in recognition. These two trains of the character outputs from the OCR may be processed by the system disclosed in U.S. Pat. No. 4,003,025 by way of example. This prior art system discriminates whether one alphabetic character field (e.g., a word) is an upper case character field or a lower case character field and then makes error correction by means of a word recognition apparatus. However, it is sometimes difficult to discriminate between a upper case character and a lower case character when a character field contains both the upper case and lower case characters or when the quality of characters on a document scanned by the OCR is poor. The resulting disparity between an OCR character output train and a dictionary word would be inaccurate, deteriorating the eventual accuracy in word recognition.