Generally, when performing prescribed processing by extracting information such as characters from a document, binarization is applied to the image representing the document in order to differentiate the character areas from other areas corresponding to the background.
There is proposed, for example, in Japanese Laid-open Patent Publication No. 2007-28362, a check reading apparatus which extracts only character areas from image data acquired of a document such as a check. The proposed check reading apparatus creates a histogram representing the density distribution in the image data acquired by scanning the document, sets a binarization threshold between a crest appearing in a high-density region and a crest appearing in a low-density region, and performs binarization using the thus set binarization threshold. However, in the case of a document where the luminance of the background area varies from position to position, for example, the background differs for each given area or varies in gradation, there have been cases where the high-density region and the low-density region cannot be distinctly separated from each other in the histogram of the image data density distribution, resulting in an inability to properly set the binarization threshold.
In view of this, in Japanese Patent No. 4077094, there is proposed a color document image recognition apparatus that uses a different binarization threshold for each sub-region within the document image. The proposed color document image recognition apparatus applies edge detection to the grayscale image acquired from the document image, and extracts each sub-region based on the connected components of the detected edge pixels. The color document image recognition apparatus then determines a binarization threshold for each sub-region and performs binarization. In this case, the color document image recognition apparatus sets all regions, other than the sub-region, as the background.
On the other hand, in Japanese Laid-open Patent Publication No. 06-113139, there is proposed an image binarizing apparatus which divides an input image into blocks and determines a different binarization threshold for each block. The proposed image binarizing apparatus divides the input image into a plurality of sub-images. Then, the image binarizing apparatus generates a histogram of the lightness levels of a sub-image of interest and its eight neighboring sub-images, enters the generated histogram data into a neural network trained in advance, and binarizes the sub-image of interest by using the output value of the neural network as the binarization threshold.