The capture, storage, and transmission of digital images and digital video have become widespread. Image binarization is the task of converting a color or grayscale image to a bilevel image consisting of just two colors, e.g., black and white. The color or grayscale image is often represented by a set of multilevel pixel values, e.g., pixel values which cover a wide range of values corresponding to different grayscale or color levels. Such a set of pixel values is sometimes called a set of multivalent pixel values while a set of pixel values where the pixel values only assume one of two possible values is sometimes called a set of bi-valent pixel values.
Many image processing operations involving binarization deal with documents including text, e.g., documents which are to be subject to optical character recognition and/or other processing operations. The documents may be scanned to produce color or grayscale sets of image data which may then need to be subject to binarization prior to subsequent processing, e.g., form and/or text recognition processing.
Often an image to be processed includes “foreground” text and diagrams against a “background” which may be uniform or show various types of non uniformity. Information loss is an inherent aspect of the binarization process but it is usually an objective, as part of the binarization process, to preserve text and line elements of the image.
As noted above, binarization is frequently a preliminary step in tasks such as optical character recognition or image compression. In the easiest cases, an image can be effectively binarized by a simple direct thresholding algorithm, e.g., setting gray-values higher than a fixed threshold to white and gray-values lower than the threshold to black. The particular threshold value for an image might be chosen by analyzing the image gray-level statistics of the entire image and then using the determined threshold throughout the image.
In practice, direct thresholding often fails to effectively separate foreground and background in the presence of one or more complicating image characteristics, e.g., a varying background. A digital image captured from an original hard-copy may show shading of both foreground and background regions as a result of the image capture process, even if the original hard-copy document consisted of purely black and white. Even more challenging is the case of an image with sharply-bounded background regions of different colors. FIG. 1 shows an image 100 with smooth brightness variations in both foreground and background as well as sharp variations in background where the foreground is defined to be the text in the image, the background is everything else.
FIG. 2 shows an image 200 representing the result of binarizing the image 100 shown in FIG. 1 using a uniform threshold, e.g., a fixed pixel-value threshold used to assign pixels to either a foreground or background value. Note that on the left side of image 200 some text has been converted to white rather than black, indicating that for those particular areas the chosen threshold level is too low whereas in other areas, e.g., on the right side, background will be converted to black. In the areas where the background was converted to black, the threshold selected for those particular image areas was too high. From FIG. 2, it should be appreciated that in many cases using the same fixed threshold throughout an image results in errors during the binarization process.
No single threshold level can effectively binarize the image 100 shown in FIG. 1 because the text on the left is lighter than the background on the right although this may not be apparent at a glance, due to the human eye-brain's high effectiveness in distinguishing foreground from background.
Use of local thresholds for different portions of an image may have advantages over a single threshold but such an approach also has problems. FIG. 3 shows an image 300 resulting from a binarization process applied to the image 100, wherein the image 100 has been divided into subregions and a separate threshold value is chosen for each subregion. This approach effectively handles the issue of smooth shading. However, the sharp background edges generate unwanted artifacts in the binarized image. The binarization approach used to generate the image 300 has the disadvantage of being incapable of discriminating between the relevant sharp edges separating foreground and background and the irrelevant sharp edges separating one background region from another background region.
Clearly, an effective background-extraction operator can facilitate image binarization. Such an operator must be nonlinear. To see this, consider the application of the operator to an image consisting of a single nonzero-valued pixel against a zero-valued background. The output image (the background) is then uniformly zero. As any image is a linear combination of such single-nonzero-pixel images, the operator, if linear, yields zero when applied to any image.
Morphological operators have been used as a tool for extracting various features (such as backgrounds) from both grayscale and binary images. Among the simplest morphological operators are dilation and erosion: the dilation (erosion) of a grayscale image by a flat structuring element can be defined as the image consisting of maximum (minimum) pixel values over all translates of a neighborhood of fixed size and shape (the structuring element). Many neighborhood shapes are employed for various special purposes, but the humble square neighborhood is one of the most popular choices owing to its simplicity and amenability to rapid computation. The opening operator is defined as an erosion followed by a dilation; the closing operator is defined as a dilation followed by an erosion, the same structuring element being used for both steps. In intuitive terms, the closing (opening) operator erases dark (bright) image features narrower than the scale of the structuring element. The alternating sequential filter (ASF) is defined as a iterated sequence of both opening operators ω and closing operators κ applied to an image I:ASF(I)=KenωenKen-1ωen-1 . . . Ke1ωe1(I).Here e1, e2, . . . , en denotes the sequence of structuring elements, generally taken to have successively increasing sizes. The iterated opening and closing operations erase both bright and dark narrow features, thus extracting the background.
One particular computationally intensive approach to image binarization has been to use an alternating sequential filter (ASF) to extract the image background and then define the foreground as those pixels deviating significantly from the background. The ASF method is described in: M. Cumplido, P. Montolio, and A. Gasull, “Morphological preprocessing and binarization for ocr systems,” in Mathematical Morphology and Its Applications to Signal Processing, pp. 393-401 (1996). The ASF based method is highly effective at binarizing images with smooth and/or sharp variations in background color but it achieves this effectiveness at a large computational cost largely due to the use of the iterative alternating sequential filter. In view of the above discussion it should be appreciated that there remains a need for methods of image binarization which are effective but less computationally complex than using an alternative sequential filter.