1. Field of the Invention
The present invention relates to an image-processing device and an image-processing method for character recognition.
2. Description of the Related Art
A rectangular area including a character-recognition target is specified by a user in a method for clipping a character-recognition target in an image including characters for character recognition. In the method for clipping a character-recognition target, more accurate designation of the character area and elimination of noise are required in order to improve the accuracy of character recognition.
For example, conventionally a rectangular area including an area with uniform density is automatically extracted from an image. This method determines that no character is included in a rectangular area with a size smaller than a predetermined size, for example, at an isolated point and thus excludes this area from the character-recognition area.
On the other hand, examples of a known method for eliminating noise include a method in which a binary threshold is set such that the pixels of a character are discriminated from the other pixels based on the density distribution, such as intensity distribution, of the image and a method in which the upper threshold and the lower threshold of the character are set based on the density distribution, and the pixels exceeding this range are eliminated.
Recently, in accordance with an improvement in the performance of information processing devices, there are many cases where color images are subjected to the character-recognition process. Therefore, a method for eliminating noise that focuses on color has been proposed.
For example, Japanese Patent Laid-open No. Hei 6-203204 discloses a method in which characters are clipped by allowing a user to select a color from color samples in, for example, a ledger sheet or by printing a specified color on a marked area which is previously designated for printing the specified color. Furthermore, in Japanese Patent Laid-open No. Hei 5-28314, three colors (RGB) are used as dropout colors, and characters are extracted using an image dropped out with RGB colors.
However, when character recognition is performed on a photo image of a billboard or magazine, the background of characters often includes a picture, illustration, or pattern, which makes it difficult to extract character information. Specifically, when there is little difference in contrast or shade between the characters and the background, it is difficult to set a threshold based on the density distribution and extract accurate character information using the threshold.
Furthermore, in a method for determining a character area based on the size of an area, areas including noise do not have sufficient regularity and thus the size of an area is imperfect to be used for extracting a character area. As a result, there are cases where a character that is not supposed to be connected to target characters is connected to the target characters or a character that is supposed to be connected to the target characters is separated from the characters; that is, the size of the character area differs depending on the quality of an image. Accordingly, a character may be excluded erroneously from the character area as noise.
Furthermore, according to the methods for clipping a character area disclosed in Japanese Patent Laid-open Nos. Hei 6-203204 and Hei 5-28314, it is difficult to perform accurate clipping of a character area on images except the ledger sheet. Moreover, these methods are rather troublesome because a user needs to specify a color from color samples, for example.