Document scanners digitize paper documents using a linear charged coupled device (CCD) array to capture an image of the document. The CCD array provides grayscale information, eight bits per pixel, to an image processing board. A typical grayscale image 10 is shown in FIG. 1. Grayscale images are relatively large, require significant random access memory to process the digital image, and require large segments of disk space to store the image. Image thresholding is often used to reduce the number of bits per pixel to one bit, black or white, and results in a significant reduction in the size of the image file. FIG. 2 shows a black and white image 12 after conversion of the grayscale image 10 shown in FIG. 1.
One method of converting a grayscale image to a black and white image is described in U.S. Pat. No. 5,583,659, "Multi-Windowing Technique for Thresholding an Image Using Local Image Properties." This type of adaptive threshold processing algorithm works well, but requires two input settings, one setting for the fixed threshold level, and one setting for the character contrast level. Having a scanner operator adjust these parameters is undesirable, since most scanner operators are not properly trained to make these adjustments. As a results, the scanner setup remains at factory set defaults.
Another problem with prior art thresholding is that a threshold value is set for the entire document. This may be a problem if various segments of the document have backgrounds with different intensity values. For example, a document may have one segment which has a light gray background 14 shown in FIG. 1, and another segment which has a dark gray background 16. Selecting one threshold value for the entire document may produce uneven results when converting the document to black and white, since interpretation of the information contained in the document may not be recognized because of insufficient contrast between the foreground and background.
A number of thresholding methods are described in the prior art. Some of the methods compute a global threshold, which is used for the entire image. In these cases, the threshold value is not adapted to local variations in the background. The histogram is often used to determine the value of the threshold, but to compute the histogram, the entire image should be buffered and the histogram should be computed before thresholding the image data. See P. K. Sahoo, S. Soltani, A. K. C. Wong and Y. Chen, "A Survey of Thresholding Techniques," Computer Vision Graphics Image Processing, Vol. 41, pp. 233-260, 1988. A number of locally adaptive approaches that are similar to the adaptive threshold processing method use local windows to determine the threshold, but sometimes are sensitive to noise and may fail to track the background variations. One technique described in U.S. Pat. No. 4,468,704, "Adaptive Thresholder," uses the black and white potentials of each pixel based on maxima and minima to classify the pixels.