1. Field of the Invention
The present invention relates to a document recognition apparatus, and more particularly to a character recognition apparatus for recognizing characters in a document by sequentially detecting one-character portions of images in a document image and by recognizing the character images of the one-character portions detected.
2. Description of the Related Art
Heretofore, document recognition apparatuses to which character recognition devices have been developed as a method of inputting a large volume of documents, ledgers, and the like which are written by sentences in which kanji and kana are mixed. As shown in FIG. 9, the document recognition apparatus comprises a character detecting section 91 for detecting the character image of one character from a document image, a character recognizing section 92 for recognizing the detected character image of one character, and a control section 93 for controlling the character detecting section 91 and the character recognizing section 92. The document recognizing operation in the document recognition apparatus is started as the control section 93 starts the character detecting section 91 and the character recognizing section 92. When the character detecting section 91 detects the character image of one character and supplies the same to the character recognizing section 92, the character recognizing section 92 effects feature extraction processing and recognition processing with respect to the character image of one character, and outputs a character code as a result of recognition. This operation of recognizing one character is repeatedly conducted, and each character in the document image is recognized, thereby effecting document recognition.
FIG. 10 is a block diagram illustrating an example of the configuration of the character recognizing section. To recognize the character images with high accuracy, it is necessary to clearly express the differences between respective characters, and to extract from the image to be recognized those features that are unlikely to be affected by deformations, noise, and the like. For this reason, various kinds of study have hitherto been made. For instance, in a document "Hagita el al., Handprinted Chinese Characters Recognition by Peripheral Direction Contributivity Feature, Transactions of the Institute of Electronics and Communication Engineers Oct. '83, Vol. J66-D, No. 10, pp 1185-1192" a character recognizing system is reported which is capable of effecting character discrimination with high accuracy with respect to handwritten kanji through discrimination processing of a two-stage system for conducting discrimination processing of detailed portions with respect to narrowed-down candidate characters after being classified on the basis of rough-classification features.
In such a two-stage discrimination processing system, as shown in the block diagram of FIG. 10, the configuration of the character recognizing section for effecting character recognition processing is comprised of an image feature normalizing section 101, a rough-classification feature extracting section 102, a rough-classification feature comparing section 103, a rough-classification standard feature storing section 104, a rough-classification sorting section 105, a precise-classification feature extracting section 106, a precise-classification feature comparing section 107, a precise classification standard feature storing section 108, and a precise-classification sorting section 109, so as to effect character discrimination with high accuracy. That is, first, when the image feature normalizing section 101 effects the normalization processing of the image to be recognized, the rough-classification feature extracting section 102 extracts a feature for rough classification. Then, by using the extracted feature for rough classification, the rough-classification feature comparing section 103 effects a comparison with a standard feature stored in the rough-classification standard feature storing section 104. As a result of such a rough-classification comparison, the rough-classification sorting section 105 effects a rough classification. Then, the object of recognition is narrowed down, and the precise-classification feature comparing section 107 performs a comparison with a standard feature stored in the precise-classification standard feature storing section 108. As a result of this comparison with the precise-classification feature, the precise-classification sorting section 109 performs sorting processing.
In addition, in general pattern recognition such as character recognition, a method is also effective in which variations of the objects of the respective features in a feature space are statistically analyzed, and the variations are reflected on the definition of the distance or the similarity. As an example of a pattern recognition apparatus of this type, a pattern recognition apparatus disclosed in Published Examined Japanese Patent Application No. 19656/1981 has been proposed. In this pattern recognition apparatus, the pattern recognition is effected on the basis of the similarity between a standard pattern and an input pattern. In the pattern recognition, M kinds of standard patterns and N kinds of standard patterns perpendicular to the same are prepared in advance as patterns corresponding to standard patterns belonging to a specific class. Then, with respect to an arbitrarily given input pattern, a determination is made of the difference between, on the one hand, the sum of squares of the similarity of M kinds produced between this input pattern and the M kinds of standard patterns, and, on the other hand, similarly the sum of squares of the similarity of N kinds produced with respect to the N kinds of standard patterns. Then, processing is effected in which whether or not the input pattern belongs to the relevant class is determined by whether or not that value becomes a value greater than a predetermined threshold value.
If an attempt is made to discriminate sentences in which kanji and kana are mixed, or general graphic patterns with high accuracy, very complicated recognition processing is required. In order to improve the document recognition apparatus with respect to general printed document, a multiplicity of features which are effective in absorbing the difference between fonts are extracted, and discrimination is effected by using a multiplicity of features. As a result, although much time is required in the recognition processing of one character, the character recognition can be effected virtually reliably.
However, if complicated recognition processing is used to improve the character recognition accuracy, the recognition speed declines, and if, conversely, the processing is simplified to improve the recognition speed, the recognition accuracy deteriorates. Thus, there is the problem that the recognition speed and the recognition accuracy are difficult to improve in a compatible manner.