1. Field of the Invention
The present invention relates to image processing. More specifically, it related to extracting characters by separating characters and noise from an image as preprocessing of a character recognition technique.
2. Description of Related Art
Conventionally, as a noise removing method from a character image, a method is known of removing noise by repeating expansion and reduction of an image, and a method of looking for an isolated point and removing the isolated point if it is decided so.
Japanese Patent Laid-Open No. H5-210761(1993) discloses a technique that makes, when the execution result of character recognition processing is not sufficient, projection in a lateral direction of a cut out pattern, identifies a noise portion from the projected image, and then performs the character recognition again after removing the noise portion from the cut out pattern.
A technique disclosed in Japanese Patent Laid-Open No. H7-49926(1995) performs discrimination processing on a cutout character image with a character cutout component, acquires a discrimination result and reliability of the discrimination, determine whether or not to output the discrimination result, performs, when making a decision of being unable to output, noise removal with a noise removing component, and executes the discrimination processing again. When the noise removing component performs noise removal, it detects isolated black pixels, and according to the size of detected isolated black pixels and the distance from other black pixels, it makes the isolated black pixels with high likelihood of noise a deletion target.
Japanese Patent Laid-Open No. 2002-157550 discloses a technique of deciding a reference value in accordance with the height of a line, cuts out characters after removing black pixel concatenations with a magnitude equal to or less than the reference value as noise, and performs character recognition of the cutout image.
On the other hand, unless an enclosing rectangle of a character is extracted correctly, it has a great influence on the character recognition accuracy. For example, as denoted by 1.1 in FIG. 1, it is likely that noise near a character is decided as an integral part thereof, and hence a character rectangle 12 is extracted which encloses a character image 10 of a Japanese katakana character or syllabic writing  and noise 11. In this case, character features for the character recognition are obtained from a feature region including border noise as denoted by 1.2.
A problem will arise when comparing the features obtained from the image including the noise of 1.2 of FIG. 1 with character features of a katakana character  which is denoted by 1.3 and stored in a character recognition dictionary and with character features of another Japanese katakana character or syllabic writing  denoted by 1.4. According to the comparison, there are some cases of deciding that the features of the input image are closer to the features of the katakana character  Thus, the katakana character  is output as a recognition result.
Incidentally, although the Japanese example is shown in FIG. 1, such a problem is not limited to katakana of Japanese, but also occurs in alphabetical cases. For example as shown in FIG. 13, when noise 1302 is present near a character image 1301 of “o”, there are some cases where a character rectangle 1303 is extracted. In this case, a character rectangle 1304 is divided into an n×m lattice (3×3, for example), and black pixel distribution in partial images passing through the division is obtained as character features. Then, when comparing the character features obtained with character features 1305 or 1306 stored in the character recognition dictionary, there are some cases where they are closer to the features 1306 of the character “q” rather than to the features 1305 of the character “o” . If such a decision is made, the character “q” is erroneously output as a result of the recognition.
If detection of connected pixels and projection processing are performed with an entire character region image set as a processing target of an isolated point search when detecting isolated points from within an image, a problem of increasing a processing load and processing time arises.
An object of the present invention is to provide image processing capable of improving a postprocessing rate of the character rectangle extraction and of improving character recognition accuracy.