The present invention relates to binarization programs, and is particularly directed to a method of parameterizing a threshold curve for a binarization program for use in an image-based document processing system such as an image-based check processing system.
In known check processing applications in which gray scale image data is obtained from scanning a bank check, two or more binarization programs may be applied to the same gray scale image data to extract a corresponding number of binary images of the check. The extracted binary images are then usually compared to identify the binary image of the best image quality. A disadvantage in applying two or more binarization programs to gray scale image data to extract a corresponding number of binary images of the check is that computational costs are relatively high. It would be desirable to optimize parameters of a binarization program on either a per image basis or a class of images such that only the one binarization program need be run.
In accordance with one aspect of the present invention, a method of processing a document comprises the steps of (a) scanning a document to obtain gray scale image data associated with the document, (b) generating a two-dimensional histogram based upon the gray scale image data obtained in step (a), (c) applying a clustering algorithm to the two-dimensional histogram to determine a set of cluster center parameters associated with a first cluster of pixels and a set of cluster center parameters associated with a second cluster of pixels, (d) parameterizing a threshold curve associated with a binarization program based upon the set of cluster center parameters associated with the first cluster of pixels and the set of cluster center parameters associated with the second cluster of pixels, and (e) applying the binarization program to the gray scale image data associated with the document using the parameterized threshold curve obtained in step (d) to provide a binarized image data representative of a binary image of the document.
Preferably, step (d) includes the steps of (d-1) calculating an average value associated with the first and second clusters of pixels, and (d-2) parameterizing at least a portion of the threshold curve based upon the average value calculated in step (d-1). The clustering algorithm includes a k-means clustering algorithm. The one cluster of pixels is representative of background of the document and the other cluster of pixels is representative of foreground of the document. One cluster of pixels is located above the other cluster of pixels. The above cluster of pixels is representative of background of the document and the other cluster of pixels is representative of foreground of the document.
In accordance with another aspect of the present invention, an apparatus for processing a document comprises means for scanning the document to obtain gray scale image data associated with the document. Means is provided for generating a two-dimensional histogram based upon the gray scale image data. Means is provided for applying a clustering algorithm to the two-dimensional histogram to determine a set of cluster center parameters associated with a first cluster of pixels and a set of cluster center parameters associated with a second cluster of pixels. Means is provided for parameterizing a threshold curve associated with a binarization program based upon the set of cluster center parameters associated with the first cluster of pixels and the set of cluster center parameters associated with the second cluster of pixels. Means is provided for applying the binarization program to the gray scale image data associated with document using the parameterized threshold curve to provide a binarized image data representative of a binary image of the document.
Preferably, the means for parameterizing a threshold curve includes means for calculating an average value associated with the first and second clusters of pixels and means for parameterizing at least a portion of the threshold curve based upon the calculated average value. The clustering algorithm includes a k-means clustering algorithm. The one cluster of pixels is representative of background of the document and the other cluster of pixels is representative of foreground of the document. One cluster of pixels is located above the other cluster of pixels. The above cluster of pixels is representative of background of the document and the other cluster of pixels is representative of foreground of the document.