1. Field of the Invention
The present invention relates to an optical character reader (hereinafter referred to as OCR) apparatus adapted to yield a picture image signal by illuminating characters written on a document card with light from a light source to render reflected light therefrom to photoelectric conversion to provide an electrical signal, extracting the feature of respective characters by extracting the original characters from the picture image signal one at a time, and comparing the features of an extracted character with a previously stored character pattern in a dictionary memory. More particularly it relates to a word comparison system of an OCR adapted to extract a character group as a word, and to compare the extracted word with a previously stored word in the memory for recognizing characters on a document card.
2. Description of the Prior Art
A prior OCR is adapted, as disclosed in Japanese Patent Application No. 59-125033, to illuminate alphabetic characters written on a document card with a lamp, and focusing reflected light onto a photoelectric converter sensor through a lens to convert the picture image signal on the document card into an electrical signal as an output from the sensor.
In succession, a pre-processing circuit extracts a fractional picture image corresponding to one character from the resulting line image and transmits it to a feature extracting circuit. The feature extracting circuit executes the so-called recognition algorithm adapted to extract the features of a character line or a background in conformity with a predetermined procedure. In succession, a character judgement circuit compares the resulting features with those of characters previously stored in a dictionary memory and delivers coincident character codes to a post-processing circuit. Three cases are considered thereupon as the output from the character judgement circuit: a plurality of character codes are yielded; only one character code is yielded; and no character code is yielded. The post-processing circuit operates for the abovenoted situations as follows: it delivers the one character, when only one character has been yielded, interpreting the character as having been satisfactorily recognized; with no character code being yielded, as described above it delivers a non-recognizable code from its output terminal as an indication that the dictionary has failed to recognize the character as being existent; and furthermore, with a plurality of character codes being yielded, a situation which may frequently occur when the character pattern resembles other character patterns, and hence the character pattern can not be said to correspond to one character code but rather appears to correspond to a plurality of character candidates. There is a method of eliminating unnecessary candidates among those plurality of candidates for selecting only one character code therefrom, by making use of previously known information indicating a certain character is not written on the document card adjoining another certain character on the basis of the context thereof. For example, when "U" and "V" are yielded as those character candidates in an Engligh sentence with a character "Z" located in front of them, the "V" is regarded as being an improper character if located just behind the "Z", it is eliminated with a result of "U" being selected, and thus the "U" code is delivered from an output terminal. Thereupon, information concerning combinations of characters not adjoining to the character code before and behind it can be employed at need by storing such information in a table.
In addition, another known method judges whether or not the candidate character is proper by noting two or three adjacent characters to previously provide the frequency of occurrences of combinations of those characters (this known method is called a 2-gram, 3-gram, and generally n-gram method).
In still another known method, a word is extracted from a character group read by a character judgement circuit to compare it with a stored previously word for judgement at a word level. Namely, character codes of characters constructing read words are compared in succession with those stored constructed words, and the stored word having the largest numbers of coincident character codes is judged to be the read word.