1. Field of the Invention
The present invention relates to an image processing apparatus.
2. Description of the Related Art
In recent years, paper form and document recognition techniques using a non-contact type image input device (OHR: Over Head Reader) have received widespread attention.
FIG. 1 shows the external view of an OHR.
The non-contact type image input device (OHR) is an image input device of a standing type shown in FIG. 1, which uses a line or an area CCD as an image pickup device. Using the OHR allows a user to feel comfortable in operations such as an operation for filling in a paper form or a document while inputting an image, an operation for inputting an image while looking at a paper form or a document, etc., in comparison with a contact type image input device such as a conventional image scanner.
However, an image captured by an OHR (OHR image) suffers degradation in image quality, such as density unevenness, a shadow, image deformation, etc., in comparison with an image captured by a scanner (scanner image).
FIG. 2 shows the state of a grayscale (graylevel?) scanner image, whereas FIG. 3 shows the state of a grayscale OHR image. It is known from the OHR image shown in FIG. 3 that the degree of density unevenness is slightly high and character lines are more blurred although no shadow exists.
FIG. 4 shows the state of an OHR image with a shadow. As shown in this figure, the OHR image has a shadow. Therefore, a binarization technique that overcomes density unevenness, a shadow, etc. is essential to use an OHR.
To enable a recognition process with high accuracy for an OHR image, a binarization method for obtaining a stable line pattern is required for a shadow/density unevenness. Binarization using a predetermined threshold is insufficient, and local binarization such as Niblack binarization, etc. must be introduced. For Niblack local binarization, refer to the following document.
φ. D. Trier, A. K. Jain: “Goal-Directed Evaluation of Binarization Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17. NO. 12, pp. 1191–1201, 1995
Niblack local binarization is a method performing binarization for each pixel by defining that a threshold value of each pixel is T=E+kσ (E and σ are respectively a density average and a standard deviation in the neighborhood of a target pixel, and k is a constant of approximately −0.4 to 0.4). A square area of N×N pixels, which centers on a target pixel, is used in the neighborhood of the target pixel (N=7 or so is frequently used).
Furthermore, if Niblack local binarization is applied unchanged, spotted black-and-white noise occurs in a background area due to the phenomenon that all of pixels in the neighborhood of a target pixel have uniform density. FIG. 5 exemplifies an OHR image, whereas FIG. 6 shows a binary image that is obtained by performing Niblack local binarization for the OHR image shown in FIG. 5.
As is known from FIG. 6, spotted black-and-white noise occurs in a background area (in case of N=7, and k=0.1).
Therefore, the spotted black-and-white background noise is eliminated by determining a 4-concatenated component with an average edge intensity of outline pixels, which is equal to or smaller than a predetermined value, of a 4-concatenated component of black pixels of a binary image to be the background noise, and by removing the concatenated component. The 4-concatenated component of black pixels is a maximum set of black pixels, which is obtained by sequentially concatenating black pixels adjacent left and right and up and down. Also there is an 8-concatenated component containing the 4-concatenated component. This is a maximum set of black pixels, which is obtained by sequentially concatenating black pixels adjacent in four diagonal directions containing left and right and up and down. Here, also the 8-concatenated component may be used. If a simple term “concatenated component” appears hereinafter, it indicates a 4- or 8-concatenated component. The outline pixels of a concatenated component is black pixels included in the concatenated component, and a white pixel being background exists in any of up and down or left and right of each of the black pixels. The average edge intensity of outline pixels is an average of the edge intensities of the outline pixels. The edge intensity of an outline pixel is an edge intensity obtained with a Sobel edge filter, etc.
FIGS. 7A and 7B explain the concept of Niblack local binarization.
As shown in FIG. 7A, for example, a square area of 7×7 pixels centering on each of pixels of a grayscale image obtained from a color or a black-and-white image is recognized to be a process target. Assume that an average of the densities of black pixels within the square area is E, and a standard deviation from the average E of the densities of the pixels within the square area is σ. A threshold value T for determining whether a pixel to be binarized is either black or white is obtained by an equation T=E+kσ. Also assume that the density of the pixel to be binarized is g. This pixel is made black if g≦T, and made white if g>T. According to such determination results, binary images are sequentially obtained by providing black or white data after being binarized as the density data of the target pixel (FIG. 7B).
FIG. 8 is a flowchart showing the flow of the Niblack local binarization process. Firstly, in step S1, a pixel to be processed is selected. In step S2, densities of pixels within a square area centering on the selected pixel are obtained. In step S3, an average E of the densities of the pixels within the square area, and a standard deviation σ are calculated. In step S4, a threshold value T is obtained with the equation T=E+kσ (k is typically a value between −0.4 and 0.4). Then, in step S5, it is determined whether or not the density of the selected pixel is equal to or larger than the threshold value T. If the result of the determination made in step S5 is “YES”, the selected pixel is made black. If the result of the determination made in step S5 is “NO”, the selected pixel is made white. In step S8, it is determined whether or not all pixels of the image to be binarized have been processed. If a pixel yet to be processed is left, the process goes back to step S1 and the subsequent operations are repeated. If it is determined that all the pixels have been processed, the process is terminated.
FIG. 9 explains a Sobel edge filter and edge intensity.
If a concatenated component shown in (1) of FIG. 9 exists, a shaded portion corresponds to outline pixels. For the outline pixels, a square area of 3×3 pixels, which is shown in (4) and centers on a pixel to be processed, is taken, and the densities of pixels within the square area and filters shown in (2) and (3) of FIG. 9 are multiplied and added. Suppose that a vector component generated by the filter in (2) is Sx, and a vector component generated by the filter in (3) is Sy. In this case, a vector S=(Sx, Sy) is obtained for the central pixel of the square area as shown in (4). The length of this vector, namely, √(Sx2+Sy2) is the Sobel edge intensity of the target pixel.
The average edge intensity is an intensity acquired by obtaining such edge intensities for all of the outline pixels of the concatenated component in (1), and by averaging the obtained edge intensities.
FIG. 10 shows a result of eliminating background noise by removing 4-concatenated components whose average edge intensities are equal to or smaller than 4.
As described above, a binary image of relatively high quality can be obtained with a conventional technique from an image having relatively good contrast shown in FIG. 5.
The background noise elimination used by conventional techniques stably and properly operates for a character having good contrast, but not for an image having poor contrast between a background and a character, that is, an image including an extremely faint character as shown in FIG. 11.
A result of executing Niblack local binarization (k=0.1) for the grayscale image of FIG. 11 is shown in FIG. 12.
If the background noise elimination is executed with the average edge intensity of 4 or smaller in a similar manner as in the case of FIG. 10, also the extremely faint character string is removed together although the background noise can be eliminated.
If the average edge intensity at the time of background noise elimination is set to 2 or smaller so as to preserve the line patterns of the extremely faint character string, this results in a binary image from which the background noise cannot be eliminated completely.
As described above, the conventional techniques using the background noise elimination that adopts the local binarization and an average edge intensity have a problem in that an extremely faint character string cannot be satisfactorily extracted without including background noise.