The present invention relates to a character recognition system, in particular, relates to a character recognition system which can read many kinds of characters by simple structure.
Two kinds of character recognition systems are known, one of which recognizes a character by observing a line of a stroke of a character, and the other of which recognizes the same by observing the background (white background) behind a character. The stroke analysis method is the typical embodiment of the former recognition system, but it has the disadvantages that it takes a long time to recognize a character among many other characters, and the recognition of a character does not always succeed when a stroke of a character has some width, because of the noise generated in the sharpening step of the stroke. The latter system (background system) has the advantage that character recognition is not affected by noise and/or deformation of the character, but there are some disadvantages in that the recognition ability of each feature is rather low, and when the apparatus is simple, very few classes can be recognized.
Another prior character recognition system has been developed by Glucksman (see Classification of Mixed-Font Alphabetics by Characteristic Loci, IEEE Computer Conf. 1967), the Glucksman's method being said background system. According to the Glucksman's method, four scanning lines are extended from any point in the white background in four directions (right, up, left, and down), and a feature code of each point in the white background is defined according to whether said four scanning lines cross a stroke line of the character. Each of said feature codes has four bits of information. Several feature codes are prepared for all the white cells in the character area, and the recognition of the character is accomplished by counting each feature code and comparing the content of each feature code with the corresponding threshold value. In this method, each element of the feature code is expressed by a ternary code, that is a number of cross points of the scanning line with the character stroke (0, 1, or more than 2) is coded in a ternary code. However, said ternary code system has the disadvantage that distributuion of the feature codes depends too much upon the deformation of a character. And if the feature code is expressed with a binary code, the recognition ability of each feature code is not sufficient. Therefore, neither ternary coding or binary coding are practical for recognizing many classes of characters with a small percentage of error.
Although the Glucksman's method has the advantage that even a deformed character can be recognized, said method has the disadvantage that recognition is very difficult when there are many classes of characters to be recognized, thus, the Glucksman's method which recognizes a character by analysing only the feature of the white background, is not suitable for reading many characters.
One of the proposals for improving the Glucksman's method is the Japanese lay open publication 46,029/76, in which characters are roughly classified by background analysis, and then the stroke method is applied, and the relationship of the coordinates among a plurality of typical points is analyzed. However, this proposal does not completely overcome the disadvantage of the Glucksman's method.