1. Field of the Invention
The present invention relates to a technique for extracting a character area from a captured image.
2. Description of the Related Art
By capturing an image of characters printed on a commodity or product with an image acquisition device, for example, a two dimensional image acquisition device using a CCD, CMOS or the like, and performing a character recognizing process in an image processing apparatus, a process of recognizing the print can be automated.
To perform the character recognizing process with high precision, a character extracting process as a pre-process of the character recognizing process is important in the image processing apparatus.
The character extracting process is a process of determining a character area included in a captured image. In a case where a captured image includes a character string made of a plurality of characters, each of the character areas corresponding to each character in the character string has to be determined from the character string.
One of the methods of extracting a character string is a method utilizing projection data of an image. Specifically, waveform data obtained by integrating pixel values of a captured image in an extraction direction is generated and analyzed. A fact is utilized that a pixel integration value of a character part is larger than that in a background part (in a case where a character is black, it is sufficient to make the pixel integration value of the character part large by reversal), and an area in which the pixel integration value exceeds a predetermined threshold is recognized as a character area.
In general, at an image captured by an image acquisition device, a light amount, especially at the peripheral portion of the image, is relatively lower than the other portion of the image based on lens characteristics of the image acquisition device. Accordingly, it is possible that the peripheral portion has a lower light amount compared with the central portion of the image. Further, by effect of the lower light amount at the peripheral portion, it is possible that the image has the inconsistencies in the intensity of the image.
FIG. 7 is a diagram showing an image 90 of a medium on which characters “T258789” are printed and waveform data 91 generated from the image 90. The waveform data 91 is data obtained by integrating pixel values along a character extracting direction Y at a coordinate position in a character string direction X of the image 90. For easier explanation, the waveform data 91 and the image 90 including the characters “T258789” are shown so as to be aligned in the character string direction X. It is understood from the figure that the pixel integration values of the character portions are larger than pixel integration values of the background part.
Since a character is black in this case, it is sufficient to make the pixel integration value of the character part large by reversal. Then, FIG. 7 shows a case that the captured image 90 has a center portion darker than the other remaining portion by an effect of illumination. Therefore, the pixel integration values of both end portions are relatively lower than the other portions in the waveform data 91.
For example, in case where a threshold 92 is set at a level as shown in the figure and then a character area is determined whether the area is in excess of the threshold 92 or not; it is not possible to extract the characters from the image precisely. In more detail, since a central portion (a longitudinal portion) of a character “T” shown in FIG. 7 has larger integrated pixel values, the portion is recognized as a character area. However, since a lateral portion of the character “T” has smaller integrated pixel value as compared with the longitudinal portion's one and, at the lateral portion, the light amount provided from an illumination device (not shown) is lower than the other areas' one, the corresponding portion of the waveform data 91 for the lateral portion becomes below the threshold 92 and thus it is difficult to recognize the lateral portion as a character area.
To solve such a problem, in general, a shading correction is provided. In detail, it is carried out so that the intensity difference between the central portion of the image and the other portions of the image are compensated. However, such a shading compensation process requires a longer processing time, so total processing time for extracting character areas takes longer.
Further, as another problem, when a distance between the characters is narrow, it is difficult to recognize the character areas. In the case shown in FIG. 7, a character “8” and character “9” are close to each other so as to be partially overlapping. In this case, integrated pixel values become larger at a boundary area between the character “8” and the character “9”. Therefore, when the threshold 92 is set at a position shown in the figure, the waveform data 91 (integrated pixel values) of the boundary area are above the threshold 92. As the result, it is recognized that the character “8” and the character “9” are combined as one character area.
To solve such a problem, especially to separate two characters from each other, Japanese Patent Publication No. 2,872,768 discloses a method of setting a search start point and a search end point in an image, integrating pixel values of the pixels passing a path connecting the start and end points, and finding a path in which the integration value is the minimum. The method allows the image processing device to extract each character area accurately even if the characters are closer to each other, however, the method has a problem that a process to choose the path takes a long time. Moreover, the method requires the following items to be preset: the search start point, the search end point, start point and the path connection between them. That is, the method requires knowing a boundary area of the characters roughly in advance.