The present invention relates to image processing, and more specifically, to thresholding techniques used in image processing.
As technological advances in digital photography continue to increase the performance of digital cameras while reducing their cost, digital cameras may become widely used as document scanners in general office environments. For example, images from a hardcopy document may be captured by a camera positioned over a desktop and digitized for further processing and display on a computer monitor. This type of scanning promotes a xe2x80x9cscan-as-you-readxe2x80x9d interface between paper and electronic media and is often referred to as xe2x80x9cover-the-deskxe2x80x9d scanning. An example of such an over-the-desk scanning system is disclosed by Wellner in U.S. Pat. No. 5,511,148 entitled xe2x80x9cInteractive Copying System.xe2x80x9d
When using a digital camera to scan documents, the camera images of the documents often need to be converted into high quality binary images for optical character recognition (OCR), which is used to translate the shapes recorded by the camera images into computer text. In general, most OCR software and numerous other image processing algorithms, such as page segmentation and skew detection algorithms, require binary images as input or can perform significantly faster using binary images. The presence of lighting variations, varying contrast between foreground and background regions of an image, bleed through (from text on the reverse side of a document), noise, blur, and low-resolution grey-scale images are factors that adversely affect the quality of binary images. When grey-scale images are not binarized correctly, OCR algorithms (as well as other image processing algorithms) become less effective.
Unfortunately, scanning with a digital camera sometimes produces camera images having a non-uniform grey-level background as a result of lighting variations. FIG. 1 illustrates an example of a camera image 100 recorded in an environment having lighting gradients. One common source of lighting variations is shadows cast on the document to be scanned. Camera image 100 illustrates that the foreground (e.g., text) and background regions may have similar grey-levels in the same portions of camera image 100 (e.g., upper right-hand comer and lower left-hand comer) such that it is difficult to differentiate between foreground and background regions.
A binary image may be produced from a grey-scale image by segmenting the grey-level image into a foreground region and a background region using thresholding techniques. When applying a thresholding technique, a threshold grey-level value for each point (or pixel) of an image is used to determine whether the pixel represents a foreground grey-level or a background grey-level. All foreground grey-level values are assigned one binary value and all background grey-level values are assigned the other binary value to generate a binary image.
When the background region of an image is uneven as a result of poor or non-uniform illumination conditions, a fixed (or global) grey-level threshold will not segment the image correctly. FIG. 2 illustrates an example of grey-scale camera image 100 binarized using a global threshold value. A large dark Region 200 indicates many background pixels that were misclassified as foreground pixels. As a result, it will be very difficult to accurately OCR the binary image shown in FIG. 2.
Adaptive thresholding techniques, which use more than one threshold value often provides better thresholding results than global thresholding techniques for images with non-uniform background grey-levels. FIG. 3 illustrates an example of grey-scale image 100 binarized using an adaptive thresholding technique. Although fewer background pixels are misclassified as foreground pixels in the binary image shown in FIG. 3 as compared to the binary image shown in FIG. 2, the misclassified pixels are still likely to cause OCR errors.
Some adaptive thresholding techniques use local average threshold values. For example, local average threshold values may be calculated based on a sample mean and a standard deviation within a small neighborhood (or window) of pixels as described in xe2x80x9cAn Introduction to Digital Image Processingxe2x80x9d, W. Niblack, pp. 113-116, Prentice Hall (1986). Alternatively, local average threshold values may be calculated by averaging the grey-scale values of neighboring edges as described in xe2x80x9cEnhancement of Document Images from Cameras,xe2x80x9d M. J. Taylor et al., SPIE, vol. 3305, pp. 230, (1998).
Unfortunately, these local average thresholding techniques often amplify noise (on the boundaries of text) and are prone to misclassify large background areas as text. They are also sensitive to the scale (or window size) over which the average and variance measures are determined.
Other adaptive thresholding techniques, interpolate a threshold surface based on high gradient places (i.e., local maxima of gradient pixels). This threshold surface, which is constructed with an iterative interpolation scheme, is used to threshold an image. Examples of these techniques are discussed in xe2x80x9cA New Method for Image Segmentation,xe2x80x9d Comput. Vision, Graph., Image Process., vol. 46, pp. 82-95 (1989) and xe2x80x9cAdaptive Thresholding by Variational Method,xe2x80x9d IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 468-473 (1998). These techniques often require edge detection techniques, thinning algorithms, and/or post-processing to remove xe2x80x9cghostxe2x80x9d objects.
Although known adaptive thresholding techniques tend to provide higher quality binary images than global thresholding techniques, adaptive thresholding techniques do not fully address the problems (e.g., lighting variations, blur, and low resolution) associated with camera images. Thus, it would be advantageous to provide a thresholding technique that generates high quality binary images regardless of the hardware (e.g., video camera, scanners, etc.) to capture images while operating independently of resolution, font type and size of text. Furthermore, it is advantageous to provide thresholding techniques that increase the reliability and robustness of OCR algorithms, page segmentation algorithms, de-skewing algorithms, and other image processing techniques that use binary images as input.
It is an object of the present invention to generate a background image of a pixmap image, which can be used in various image enhancement techniques.
A system, method, and article of manufacture of the present invention for processing a pixmap image is described. A background image of the pixmap image is generated by computing a block average image of the pixmap image, a block variance image of the bitmap image and a variance threshold surface. The variance threshold surface is used to threshold the block variance image in order to segment the block average image into foreground and background regions. A background image of the pixmap image is then generated based upon the segmented foreground and background regions. In a preferred embodiment of the present invention, the background image of the pixmap is generated by replacing all pixels in the foreground region with interpolated background pixels.
For various embodiments of the present invention, the background image of the pixmap image is used to perform additional image processing on the pixmap image. For example, the background image is used to generate a background threshold surface, which is used to binarize the pixmap image by thresholding the pixmap image into foreground and background regions.
For alternative embodiments of the present invention, the background image is used to produce an image having a more uniform background grey (or color) level by normalizing a pixmap image. For example an operation using the background image is performed on the pixmap image. The operation may include subtracting the background image from the pixmap image, dividing the pixmap image by the background image, or other operations based on the background image.
In yet other embodiments of the present invention, the background image is used as input or parameter values for other image processing algorithms such as grey-scale character recognition algorithms.
Other objects, features, and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description that follows below.