1. Field of the Invention
The present invention relates to a technique for extracting a character area from a captured image.
2. Description of the Related Art
By capturing an image of characters printed on a commodity or product with an image acquisition device, for example, a two dimensional image acquisition device using a CCD, CMOS or the like, and performing a character recognizing process in an image processing apparatus, a process of recognizing the print can be automated.
To perform the character recognizing process with a high precision, a character extracting process as a pre-process of the character recognizing process is important in the image processing apparatus.
The character extracting process is a process of determining a character area included in a captured image. In a case where a captured image includes a character string made of a plurality of characters, each of the character areas corresponding to each character in the character string has to be determined from the character string.
One of the methods of extracting a character string is a method utilizing projection data of an image. Specifically, waveform data obtained by integrating pixel values of a captured image in an extraction direction is generated and analyzed. A fact is utilized that a pixel integration value of a character part is larger than that in a background part (in a case where a character is black, it is sufficient to make the pixel integration value of the character part large by reversal), and an area in which the pixel integration value exceeds a predetermined threshold is recognized as a character area.
FIG. 11 is a diagram showing an image 90 of a medium on which characters “AB450” are printed and waveform data 91 generated from the image 90. The waveform data 91 are data obtained by integrating pixel values in a character extracting direction Y at a coordinate position in a character string direction X of the image 90. For easier explanation, the waveform data 91 and the image 90 including the characters “AB450” are shown so as to be aligned in the character string direction X. It is understood from the figure that the pixel integration values of the character portions are larger than the pixel integration values of the background part.
Noise 95 occurred between the character “B” and the character “4”. The noise part also has a pixel integration value larger than that of the background part.
Therefore, to exclude the noise 95 from the character area, the threshold has to be set to a value larger than the pixel integration value of the area of the noise 95. Consequently, by setting a threshold 92 at a position as shown in FIG. 11, which is larger than the pixel integration value of the area of the noise 95, the noise 95 can be excluded from the character area. However, when the character area is determined by using the threshold 92 set by the above-described method, a center region of a character having a small pixel integration value such as the character “O” is excluded from the character area. That is, the area corresponding to the character “O” is erroneously recognized as two character areas.
To accurately extract the character “O”, the threshold has to be set lower than the above-described value. For example, when a threshold 93 is set at a position as shown in FIG. 12, the character “O” can be extracted accurately. However, when the threshold 93 is used, the noise 95 is also recognized as a character area. As described above, the method of adjusting the threshold cannot satisfy both the purpose of reliably excluding noise and a purpose of extracting the whole character area including an area having a low pixel integration value.
There is also a method of providing a threshold for the width of the extracted character area. That is, when the width of the area extracted as the character area is narrower than a predetermined width, the area is determined as noise and excluded from the character area. However, in a case where the density difference from the background is smaller but there is noise or dirt of a wide area whose size is close to a character, the noise cannot be eliminated.
Japanese Patent Publication No. 2,872,768 discloses a method of setting a search start point and a search end point in an image, integrating pixel values of pixels passing a path connecting the start and end points, and finding a path in which the integration value is the minimum. According to the method, although a character area can be extracted accurately, a search start point, a search end point, and a search area connecting the points have to be set in advance. That is, the method can be executed on the condition that a character boundary area can be predicted to some extent.