1. Field of the Invention
The present invention relates to a technology for detecting a character area in an image.
2. Description of the Related Art
In digital image processing apparatuses, such as scanners and copiers, various areas of an input image, including character areas and picture areas, are distinguished for performing image processing suitable for each area and performing compression in a manner suitable for each area. Thus, image quality and compression efficiency are improved.
As a known technology of distinguishing a character area in a digital image, edges are extracted from digital image data by using the fact that characters often have high density and a large amount of edge, and a dense area between edges is taken as a character area.
With reference to FIG. 26, a conventional technology disclosed in Japanese Patent Application Laid-Open No. 2001-52186 is briefly explained. In this conventional technology, an edge in an image is extracted, and an edge group formed of a start point and an end point is generated. The start point represents a pixel determined as a switching point from a non-character area to a character area when the image is sequentially scanned by one line. The end point represents a pixel determined as a switching point from a character area to a non-character area. That is, the number of a sequence of pixels or the total number of pixels with a predetermined density or more is calculated among the generated edge groups and, based on the calculation result, it is determined whether a pixel among the edge group is in a character area. Thus, a character area can be extracted with high accuracy.
In another conventional technology disclosed in Japanese Patent Application Laid-Open No. 2006-5680, character areas are classified into an edge of a character and a character inner space (the inside of a character). After the edge of the character is extracted, the character inner space is determined.
However, in the former conventional technology, in line scanning, it is implicitly assumed that a left end of a character area (on a start point side) and a right end thereof (on an end point side) match edge conditions of the start point and the end point. In an actual image, however, pixels to be the start point and the end point do not always satisfy the respective edge conditions. Thus, character area extraction cannot be accurately performed.
For example, as shown in FIG. 27, when a changing point from a non-character area to a character area has a large amount of edge and a changing point from a character area to a non-character area has a small amount of edge, the start point is detected, but the end point is not detected. In this case, before the end point corresponding to the start point is found, the start point of the next character might be found. In such a case, an area supposed to be extracted as a character area will not be extracted as a character area.
A portion where no edge is extracted is a portion where a background and a character are gradually switched. That is, a stable character output cannot be achieved in the case of an image with gradation.
In this conventional technology, to obtain a character area, initial selection is performed to select character area candidates based on the concept that a character area in an image is always interposed between edges, and then secondary selection is performed by using density information of image data. An area even with one side not satisfying an edge condition is not determined as a character area.
Although edges provide useful information for determining a character area, there are unexpectedly many areas in which a boundary between a character and a non-character is not extracted as an edge. With secondary selection, character areas and non-character areas may be mixed to some extent. However, if many non-character areas are present among edges, it is difficult to identify a character area.
Therefore, there is a need for avoiding the case as shown in FIG. 27 and appropriately detecting a character area based on density information without depending on the edge group as shown in FIG. 28.
In the latter conventional technology, unlike the former one, an edge group does not have to be found. However, as in the former conventional technology, the character determination result depends on edges. Thus, a character area cannot be accurately extracted.
Moreover, particularly in general embedded devices, there are restrictions on memory, processing time, and others and, currently, memory-saving processing has always been desired. However, the former technology has a problem in which information about an edge group on a target line and information about an edge group on the previous line have to be retained in memory.