1. Field of the Invention
The present invention relates to an apparatus and method for recognizing a pattern, and realizes to recognize characters, graphics, and symbols correctly depending on various states of input images when used with a printed character recognizing apparatus and a graphics recognizing apparatus as well as a handwritten character recognizing apparatus.
2. Description of the Related Art
Conventional handwritten character recognizing apparatuses such as an optical character reader (OCR) are designed for automatically reading characters written on an accounting list, etc. and automatically inputting the characters to eliminate the necessity of manually finding characters written on the accounting list, etc. and inputting the characters through a keyboard.
FIG. 1 is a block diagram showing the configuration of the conventional handwritten character recognizing apparatus.
In FIG. 1, a form/document 311 is read using a scanner to obtain a multiple-value image of the form/document 311.
A preprocessing unit 312 binarizes a multiple-value image, removes noises, and amends the position of the form/document 311.
Then, a character detecting unit 313 detects each character according to information about preliminarily defined ruled lines and positional information about a character.
A character recognizing unit 314 recognizes each character and outputs a character code. The character is recognized by collating each feature of an unknown character pattern detected by the character detecting unit 313 with the feature of each character category preliminarily entered in a recognizing dictionary 315.
For example, a distance between feature vectors in a feature space is computed by converting a 2-dimensional character pattern into a feature vector in a feature space representing the feature of the character, as a similarity between the unknown character pattern and the character category preliminarily entered in the recognizing dictionary 315. When the shortest distance is obtained between the feature vector of the unknown character pattern and the feature vector of the character category preliminarily entered in the recognizing dictionary 315, the character category is recognized corresponding to the unknown character pattern.
A threshold is set for a distance between two feature vectors to avoid mistakenly recognizing a non-character such as a deletion line, a noise, a symbol, etc. for a character and outputting a character code for a non-character. If the distance between the two feature vectors is larger than the threshold, a reject code is output by determining that the unknown character pattern has no corresponding character category preliminarily entered in the recognizing dictionary 315, or that the unknown character pattern refers to a non-character.
The recognizing dictionary 315 also contains the features of the character categories of high-quality characters, obscure characters, and deformed characters. A high-quality -character recognizing dictionary 315 is referred to for high quality characters. An obscure character recognizing dictionary 315 is referred to for obscure characters. A deformed-character recognizing dictionary 315 is referred to for deformed characters. Thus, the difference in quality of the characters in the form/document 311 can be processed correspondingly.
FIG. 2 shows the configuration of the character recognizing apparatus for recognizing a character with a deletion line.
The character recognizing apparatus shown in FIG. 2 comprises an image input unit 491 for inputting an original image containing a character and detecting or preprocessing a character from the input image, and an identifying unit 492 for identifying a character by extracting the feature of the character and comparing the extracted feature with the feature of the standard pattern stored in the recognizing dictionary.
When a character mistakenly entered in a form is removed with a deletion line, for example, six or more horizontal lines are entered on the character. It is determined that the character provided with six or more horizontal lines cannot be identified, and the character is rejected by the identifying unit 492 because it does not match any standard pattern stored in the recognizing dictionary.
However, the handwritten character recognizing apparatus shown in FIG. 1 equally processes a detected character among obscure characters, deformed characters, and high-quality characters using the same recognizing dictionary 315.
Accordingly, there has been a problem that information about an obscure character entered in the recognizing dictionary 315 has a bad influence on the high-quality character recognizing process, and the obscure character entered in the recognizing dictionary 315 prevents high quality characters from being successful read.
In addition to obscure and deformed states, there are various environments for characters. For example, a character may touch its character box. When a single recognizing dictionary 315 is referred to in various environments, they affect each other, thereby generating a problem that the recognizing process cannot be performed with enhanced precision.
When the character recognizing apparatus shown in FIG. 2 recognizes a character, six or more horizontal lines are required to delete an entered character using a deletion line. This is a heavy load to a user and therefore cannot be completely observed. As a result, a character with an apparent deletion line makes a small distance from a standard pattern stored in the recognizing dictionary and fails to be clearly distinguished from a character without a deletion line. Thus, the character to be deleted cannot be rejected and mistakenly read.
For example, as indicated by (A) shown in FIG. 3, the `0` to be deleted is not rejected but recognized as `8`. As indicated by (B) shown in FIG. 3, the `1` to be deleted is not rejected but recognized as `8`. As indicated by (C) shown in FIG. 3, the `7` to be deleted is not rejected but recognized as `4`. As indicated by (D) shown in FIG. 3, the `6` to be deleted is not rejected but recognized as `6`.