Image binarization refers to the process of converting an image represented by pixel values which may assume multiple levels to pixel values which can be one of two values, e.g., a first value corresponding to foreground and a second value corresponding to background. Image binarization can be used to convert a grayscale or a color image to a black and white image. Frequently, binarization is used as the pre-processing step of document image processing. For example, barcodes are typically captured as grayscale or color images that may require binarization as a first step before processing to decode the barcode to ascertain its value. Similarly, optical character recognition (OCR) algorithms may require binarization as a first step.
One approach to using image binarization is to choose a single threshold value and classify each of the pixels of the image with values above this threshold as white and each of the other pixels as black. Such a method works well when the characteristics of the image are consistent across the image. When they are not, because the nature of the contents differs or the density of the characteristics differs or the local lighting differs or there are shadows or defects, etc, then it is better to determine the threshold according to local statistics and adaptively change it for each pixel.
Adaptive image binarization methods may use a sliding window, centered on the pixel to be binarized, and moved along the image from upper left to lower right, for example. At each fixed position of the window, the pixels within the window are evaluated and analyzed to generate a local threshold to use to binarize that particular pixel at the center of the window. Then the window is moved by one pixel and the threshold computed using the contents of just this new window to binarize the pixel at the center of this new window. This continues for all pixels in the image. In this way, every pixel is converted to either black or white according to the statistics of its window and excluding everything outside that window.
The statistics used by known methods for determining thresholds for image binarization are the mean and standard deviation. Related to, and sharing some of the computation with, the standard deviation is the variance. For a window centered at pixel P and containing M×N pixels having graylevel values I[P], the mean m, the variance ν, and the standard deviation s are defined by
      m    =                  1        MN            ⁢              ∑                  ∑                      I            ⁡                          [              P              ]                                                v    =                  1        MN            ⁢              ∑                              ∑                          s              =                              v                                              ⁢                                    (                                                I                  ⁡                                      [                    P                    ]                                                  -                m                            )                        2                              
A well-known binarization method that automatically adapts the threshold value pixel-by-pixel calculates a threshold t for binarizing each pixel P by using a formula due to Sauvola et al. [J. Sauvola, T. Seppanen, S. Haapakoski, and M. Pietikäinen. “Adaptive Document Binarization,” Proc. International Conference on Document Analysis and Recognition, volume 1, pages 147-152, 1997],
  t  =            m      ⁡              (                  1          +                      k            ⁡                          (                                                (                                      s                    R                                    )                                -                1                            )                                      )              =          m      ⁡              (                              (                          1              -              k                        )                    +                      k            ⁡                          (                              s                R                            )                                      )            where m is the mean of the graylevel values in a window centered at the pixel under consideration, s is the standard deviation of the graylevel values of that window, R is the maximum possible value for the standard deviation (e.g., R=128 for an 8-bit grayscale image), and k is a positive number less than one, typically between 0.2 and 0.5. The same window size is used for every pixel in the image.
Performance can be improved by modifying the above equation to subtract the minimum graylevel value found in the whole image from the mean of each window and changing the image-independent constant R to the image-dependent R′, where R′ is the maximum value of all standard deviations computed for all windows in the image. The resulting formula is originally due to Wolf et al. [Christian Wolf, Jean-Michel Jolion, Francoise Chassaing. “Text Localization, Enhancement and Binarization in Multimedia Documents,” Proc. International Conference on Pattern Recognition, volume 4, pp. 1037-1040, 2002.] and reformulated herein to better show the relationship to the Sauvola equation,
  t  =                    (                  m          -          Z                )            ⁢              (                              (                          1              -              k                        )                    +                      k            ⁡                          (                              s                                  R                  ′                                            )                                      )              +    Z  where Z is the minimum graylevel value over the whole image and R′ is the maximum value of the standard deviations of all windows of the chosen size in the image.
A big problem with the above-described procedure is the inability of a single window size used over the whole image to set a correct threshold for all pixels in the image. An overly-large window size does not allow for local variations in foreground and background intensity to be taken into account and will miss local detail. But a window size that is too small may have all of its pixels, or almost all of its pixels, be foreground pixels, if the features in the foreground include objects as large as, or almost as large as, the window size. The threshold so computed may cause a false classification of foreground pixels into both background and foreground pixels, or even to just background. An effect of this is that the larger features in the foreground end up with holes or other spurious errors in the binarized image. Therefore, a single window size for the whole image is unsatisfactory for many images.
A second problem with the above procedure is the computational complexity of the square root required for computing the standard deviation. That makes it difficult to do binarization on devices with limited computational capability or to do more processing to better the binarization without requiring more resources.
Accordingly, it should be appreciated that there is a need for improved adaptive image binarization methods. It would be beneficial to have available a means for automatically adapting what pixels are included in the computation of the local threshold. It would be further beneficial to keep low the complexity to compute each threshold since that computation has to be done for each and every pixel.