1. Field of the Invention
The present invention relates to an apparatus and method for recognizing a pattern, and realizes to recognize characters, graphics, and symbols correctly depending on various states of input images when used with a printed character recognizing apparatus and a graphics recognizing apparatus as well as a handwritten character recognizing apparatus.
2. Description of the Related Art
Conventional handwritten character recognizing apparatuses such as an optical character reader (OCR) are designed to automatically reading characters written on an accounting list, etc. and automatically inputting the characters to eliminate the necessity of manually finding characters written on the accounting list, etc. and inputting the characters through a keyboard.
FIG. 1 is a block diagram showing the configuration of the conventional handwritten character recognizing apparatus.
In FIG. 1, a form/document 311 is read using a scanner to obtain a multiple-value image of the form/document 311.
A preprocessing unit 312 binarizes a multiple-value image, removes noises, and amends the position of the form/document 311.
Then, a character detecting unit 313 detects each character according to information about preliminarily defined ruled lines and positional information about a character.
A character recognizing unit 314 recognizes each character and outputs a character code. The character is recognized by collating each feature of an unknown character pattern detected by the character detecting unit 313 with the feature of each character category preliminarily entered in a recognizing dictionary 315.
For example, a distance between feature vectors in a feature space is computed by converting a 2-dimensional character pattern into a feature vector in a feature space representing the feature of the character, as a similarity between the unknown character pattern and the character category preliminarily entered in the recognizing dictionary 315. When the shortest distance is obtained between the feature vector of the unknown character pattern and the feature vector of the character category preliminarily entered in the recognizing dictionary 315, the character category is recognized corresponding to the unknown character pattern.
A threshold is set for a distance between two feature vectors to avoid mistakenly recognizing a non-character such as a deletion line, a noise, a symbol, etc. for a character and outputting a character code for a non-character. If the distance between the two feature vectors is larger than the threshold, a reject code is output by determining that the unknown character pattern has no corresponding character category preliminarily entered in the recognizing dictionary 315, or that the unknown character pattern refers to a non-character.
The recognizing dictionary 315 also contains the features of the character categories of high-quality characters, obscure characters, and deformed characters. A high-quality character recognizing dictionary 315 is referred to for high quality characters. An obscure character recognizing dictionary 315 is referred to for obscure characters. A deformed-character recognizing dictionary 315 is referred to for deformed characters. Thus, the difference in quality of the characters in the form/document 311 can be processed correspondingly.
FIG. 2 shows the configuration of the character recognizing apparatus for recognizing a character with a deletion line.
The character recognizing apparatus shown in FIG. 2 comprises an image input unit 491 for inputting an original image containing a character and detecting or preprocessing a character from the input image, and an identifying unit 492 for identifying a character by extracting the feature of the character and comparing the extracted feature with the feature of the standard pattern stored in the recognizing dictionary.
When a character mistakenly entered in a form is removed with a deletion line, for example, six or more horizontal lines are entered on the character. It is determined that the character provided with six or more horizontal lines cannot be identified, and the character is rejected by the identifying unit 492 because it does not match any standard pattern stored in the recognizing dictionary.
However, the handwritten character recognizing apparatus shown in FIG. 1 equally processes a detected character among obscure characters, deformed characters, high-quality characters using the same recognizing dictionary 315.
Accordingly, there has been a problem that information about an obscure character entered in the recognizing dictionary 315 has a bad influence on the high-quality character recognizing process, and the obscure character entered in the recognizing dictionary 315 prevents high quality characters from being successful read.
In addition to obscure and deformed states, there are various environments for characters. For example, a character may touch its character box. When a single recognizing dictionary 315 is referred to in various environments, they affect each other, thereby generating a problem that the recognizing process cannot be performed with enhanced precision.
When the character recognizing apparatus shown in FIG. 2 recognizes a character, six or more horizontal lines are required to delete an entered character using a deletion line. This is a heavy load to a user and therefore cannot be completely observed. As a result, a character with an apparent deletion line makes a small distance from a standard pattern stored in the recognizing dictionary and fails to be clearly distinguished from a character without a deletion line. Thus, the character to be deleted cannot be rejected and mistakenly read.
For example, as indicated by (A) shown in FIG. 3, the xe2x80x980xe2x80x99 to be deleted is not rejected but recognized as xe2x80x988xe2x80x99. As indicated by (B) shown in FIG. 3, the xe2x80x981xe2x80x99 to be deleted is not rejected but recognized as xe2x80x988xe2x80x99. As indicated by (C) shown in FIG. 3, the xe2x80x987xe2x80x99 to be deleted is not rejected but recognized as xe2x80x984xe2x80x99. As indicated by (D) shown in FIG. 3, the xe2x80x986xe2x80x99 to be deleted is not rejected but recognized as xe2x80x986xe2x80x99.
The present invention aims at providing a pattern recognizing apparatus and method capable of appropriately recognizing a character with high precision depending on the environment of the character.
According to the feature of the present invention, an input pattern is recognized by extracting the first predetermined feature from the input pattern and extracting the second predetermined feature from the input pattern from which the first feature has been extracted.
As a result, a recognizing process can be performed depending on each environment of a character.
According to other features of the present invention, a pattern is recognized by extracting the state of a process object from an input image and selecting a recognizing process suitable for the state for each process object.
Thus, a pattern recognizing process can be performed appropriately for each state on the input image having various states, thereby realizing the recognizing process with high precision.
According to other feature of the present invention, a state of a process object is extracted from an input image, and a pattern recognizing process exclusively for the first state is performed on the process object in the first state, and a pattern recognizing process exclusively for the second state is performed on the process object in the second state.
Thus, the recognizing process on the process object in the first state interact with the recognizing process on the process object in the second state, thereby successfully performing the recognizing processes with high precision.
According to other feature of the present invention, recognizing dictionaries are appropriately selected for an input image in various states.
For example, even if obscure characters, deformed characters, and high-quality characters are mixed in the input image, the recognizing process can be performed with high precision by using an obscure character recognizing dictionary for obscure characters, a deformed-character recognizing dictionary for deformed characters, and high-quality character recognizing dictionary for high-quality characters.
According to other feature of the present invention, identification functions are appropriately selected for an input image in various states.
The recognizing process can be performed with high precision by, for example, recognizing a character using a city block distance on a character written in a one-character box, and recognizing a character using a discriminant function on a character written in a free-pitch box in consideration of the character detection reliability.
According to other feature of the present invention, knowledge is appropriately selected for an input image in various states.
The recognizing process can be performed with high precision by, for example, setting a correspondence between an unknown character and a character category by dividing a character into character segments when an unknown character is considerably deformed and has no correspondence with a character category stored in the recognizing dictionary, computing the detection reliability using a discriminant function generated based on a learning pattern when a character is detected from a character string, and evaluating the recognition reliability on a box-touching character using the reliability obtained through a learning pattern when the box-touching character is recognized.
According to other feature of the present invention, the recognizing process is performed according to priority until the reliability of the recognizing process reaches a predetermined value when a plurality of recognizing processes are called for a specified process object.
Thus, the reliability of the recognizing process can be enhanced and the precision of the process can be successfully improved.
According to other feature of the present invention, a non-character is extracted from an input image and a non-character recognizing process and a character recognizing process are performed separately on the extracted non-character.
As a result, the recognizing process can be performed with high precision with less characters mistaken for non-characters and with less non-characters mistaken for characters.
According to other feature of the present invention, the first predetermined feature is extracted from an input pattern, and the input pattern is recognized by extracting the second predetermined feature from the input pattern from which the first predetermined feature has not been extracted.
Thus, a character with a deletion line can be distinguished from a character without a deletion line, and only the character without a deletion line can be recognized. Therefore, it is possible to prevent a character with a deletion line from being mistakenly recognized for any other character.
According to other feature of the present invention, the first predetermined feature is extracted from an input pattern, a portion contributing to the first predetermined feature can be removed from the input pattern from which the first predetermined feature has been extracted, and the input pattern is recognized based on a pattern from which the portion contributing to the first predetermined feature has been removed.
Therefore, only a deletion line can be removed from the character with the deletion line when the character is recognized, thereby improving the precision in recognizing the character.