Techniques are known for removing noise from digital representations of data images obtained by digitally scanning documents and the like. The scanned documents are processed to identify objects within the scanned images that are, in turn, used to mask out the noise. For example, U.S. Pat. No. 7,016,536 discloses a method and apparatus for removing noise by building objects from reduced resolution representations of the scanned image and including the identified objects in a mask that is logically ANDed with the de-skewed representation of the scanned document. Objects identified as picture objects are included in a mask and logically ANDed with the de-skewed representation to eliminate all other objects, while objects marked as data objects are added to the representation to provide a de-skewed, de-speckled representation of the scanned document.
Speckles in noisy images may have a significantly adverse impact on the OCR process as the speckles may be mistaken for text letters or images. Typical methods for reducing speckling are not well-suited to preserving edge structure and are not particularly helpful. Other techniques, such as a speckle-model based Quadratic Volterra Filter (QVF) have been proposed to solve the problem of smoothing speckle noise in digital images while preserving important edge information. However, QVFs are very difficult to design due to a large number of independent coefficients. Zaman et al. in an article entitled “A Comparison of Adaptive Filters for Edge-preserving Smoothing of Speckle Noise,” IEEE (1993), V-77-V-80, disclose improved results using a “Proportional Weight Coefficient” method in which the quadratic coefficients are grouped into different types depending on the inter-pixel distance of the pixel pair involved and assigns weights to the types whereby each weight is distributed proportionally among the coefficients in the corresponding type. Zaman et al. also disclose speckle-specific edge detection using a modified ratio of averages (MROA) detector and a radio and gradient of averages (RGOA) detector. Other known techniques for removing speckle include using a rational filter as a speckle smoothing operator that uses local estimates of the standard deviation to mean ratio to remove speckle (see Ramponi et al. in an article entitled “Smoothing Speckled Images Using an Adaptive Rational Operator,” IEEE Signal Processing Letters, Vol. 4, No. 3, March 1997) or using an adaptive shrinkage function for determining wavelet coefficients which represent the speckle noise (see Pizurica et al. in an article entitled “Despeckling SAR Images Using Wavelets and a New Class of Adaptive Shrinkage Estimators,” IEEE (2001)). However, further techniques are desired that allow for the removal of speckle from binarized images while preserving the edge information so as to facilitate better OCR results from binarized documents.
A number of techniques for binarization for minimizing noise are known in the art (see, for example, an article by Trier et al. entitled “Goal-Directed Evaluation of Binarization Methods,” IEEE (1995)). Such preprocessing of the image improves the accuracy of the optical character recognition. However, documents, such as receipts, that are of poor quality pose a special challenge regarding image processing. Despite these known binarization methods, the problem remains that if there exists relatively strong noise on the original image, de-noising relying only on edge binarization is often not enough to improve the OCR results.
It is desirable to evaluate the noise level of the image and to perform despeckling to reduce more noise in the image as necessary so that the impact of speckles on the OCR process will be minimized. The present invention addresses such needs in the art.