A. Field of the Invention
This invention relates to a character recognition apparatus, and more particularly to one that is arranged to recognize characters, particularly printed alphanumeric characters, even when it is realized on a general-purpose personal computer, without using special hardware.
B. Prior Art
Recognition of alphanumeric characters has reached the stage of practical use, and several products are available in the market. However, most of them are dedicated machines that require special hardware and are expensive. These systems are mainly used for special applications in which large numbers of documents are continuously read.
Applications with office automation equipment incorporated with an OCR device will be widely used in the future, which easily read and edit typed or printed alphanumeric documents and then store them on a disk or the like. For such applications, a compact, inexpensive OCR for printed alphanumeric characters is essential.
Typical conventional techniques for printed alphanumeric characters are a pattern matching technique where a feature value is extracted from a binary coded character pattern and the character is then selected as the first candidate whose reference pattern in the recognition dictionary has the nearest feature vector to one of the inputted character; and a technique for determining a candidate character by using a so-called binary tree dictionary with point sampling. The former is easy in regards to the creation of the recognition dictionary and adaptation, but has a disadvantage in that it has a slower recognition speed because it generally uses a complicated feature value and processing procedure to improve the recognition rate. The latter requires a vast amount of statistics and time for creation, and addition and modification to the dictionary is difficult. Therefore, in the conventional approach, it is difficult to attain sufficient processing speed, an easy-to-use dictionary creation feature, and an addition and modification feature on a general-purpose personal computer. For detail of the above-mentioned technique using a binary tree dictionary with point sampling, refer to, for example, "A Processor-based OCR System" by R. G. Casey and C. R. Jik, IBM Journal of Research and Development, Vol. 27, No. 4, July, 1983, pp. 386-399, or "Decision Tree Design Using Probabilistic Model" by R. G. Casey and G. Magy, IEEE Transaction on Information Theory, Vol. IT-30, No. 1, January, 1984, pp. 93-99.
Prior art relevant to the invention includes Japanese Published Unexamined Patent Applications, PUPA Nos. 60-61876, 61-75486, and 61-173387. In these specifications, a decision tree for recognizing a character or pattern is formed by sequentially performing classification based on multi-value feature, and the candidates are narrowed down by using this tree. However, this technique never describes the use of a reject node.
Other related prior art is as follows.
JPUPA Nos. 57-212589 and 56-35276 disclose registering rejected character patterns and improving recognition thereafter. JPUPA No. 58-201184 discloses classifying character patterns by using distance from a predetermined position on the character frame to a position in the character pattern or on the frame in a predetermined direction.