In the art of optical character recognition, an image classifier is a functional unit that attempts to match sample images against a set of referent images or templates. Although most character images are sampled in grayscale, which results in multiple data bits per image pixel, image classifiers are generally limited to binary (bi-level) input data. Analyzing grayscale data is substantially more complicated, and requires time-consuming, sophisticated techniques. Thus, although some grayscale classifiers exist, most readily-available image classifiers accept only binary input data. A variety of binary classifiers for optical character recognition are known in the art, such as the system described in U.S. Pat. No. 5,539,840 to Krtolica et al. for "Multifont Optical Character Recognition Using a Box Connectivity Approach," which is incorporated herein by reference.
Because the sample image comprises grayscale data, but the image classifier accepts only binary data, the sample image must be converted initially from grayscale into black and white. This step normally requires a process called thresholding or binarization, which includes selecting a median gray level (usually called a "binarization threshold" or "threshold") and changing the value of each image pixel to either zero or one, depending on whether the original gray level of the pixel had a value greater or less than that of the threshold. In conventional systems, binarization of the sample image is generally performed once, using a single threshold, after which the binary output data is provided to the image classifier.
As conventionally implemented, however, thresholding often dramatically reduces recognition accuracy. When an image is thresholded, much useful information about the image is lost. For example, an eight bit grayscale image contains eight times more data than the same thresholded image. Such data assist the human eye in recognizing the image, but are lost to conventional image recognition systems because of thresholding.
In addition, thresholding introduces harmful noise into the image. Slight deviations in the image's gray levels are often manifest after thresholding in the form of jagged edges, stray pixels, gaps, and other artifacts that reduce recognition accuracy. Moreover, after thresholding, the sample image is typically normalized to the size of the referent images. However, normalizing binary data generally compounds the noise, reducing recognition accuracy to an even greater degree. What is needed, then, is a method and system for providing binary data to a binary image classifier while retaining as much information as possible about the original grayscale image and reducing the noise associated with the processes of thresholding and normalization.
As noted earlier, in conventional systems, thresholding is normally performed as a separate step from image classification. Thus, in such systems, thresholding is merely a simplification or quantizing step. However, as shown in FIG. 1, thresholding is central to classification and is not so easily separable therefrom. For example, matrix (a) of FIG. 1 represents a grayscale image sampled at eight bits (256 gray levels) per pixel. If the binarization threshold ("T") is selected to be 128, matrix (b) illustrates the resulting binary image, which would be interpreted by a binary image classifier as the letter "U." If, however, the threshold is selected to be 140, matrix (c) illustrates the resulting binary image, which would be interpreted to be the letter "L." Both interpretations are valid. However, in each case, the selection of the binarization threshold determines which pixels are in the foreground ("1") and which pixels are in the background ("0"). Thus, the thresholding step effectively determines the classification of the image.
The foregoing situation often occurs where there is poor contrast between the foreground and background, and where the foreground or background gray levels are not uniform throughout the sampled image. The human eye can easily compensate for these anomalies. However, a conventional image recognition system that separately thresholds the image before classification will frequently produce inaccurate results. Indeed, as shown above, an arbitrary selection of either threshold will often eliminate valid, and possibly correct, interpretations of the character image.
Conventionally, a binary image classifier cannot detect such alternative interpretations based on different thresholds since the thresholding step is performed separately from classification. If thresholding could be performed with a foreknowledge of the referent images, then a number of possible interpretations of the sample image, based on different thresholds, could be determined. Moreover, only those interpretations having an acceptable "distance" from the binarized sample image could be selected.
What is needed, then, is a method and system for integrating the thresholding and classification steps such that a number of interpretations of the image are found using different thresholds. Moreover, what is needed is a method and system for selecting an interpretation wherein the distance between the binarized sample and the referent image is minimized. Hereafter, this process is called "classification-driven thresholding." What is also needed is a method and system for performing classification-driven thresholding in an efficient manner, without having to resort to exhaustive comparison of all possible thresholded images with the set of referent images. Finally, what is needed is a method and system for disambiguating a candidate set by selecting a preferred interpretation of the character image.