1. Field of the Invention
This invention relates to method and related apparatus for binarizing document images.
2. Description of Related Art
An initial step in document image processing is binarization, i.e., to convert a multi-bit image (e.g. 8-bit image) into a 1 bit image. Document image refers to images generated from a hardcopy document, for example by scanning or photographing a document, where the document typically contains text and may also contain images and graphics. Binarization is typically performed before other processing such as OCR (optical character recognition).
In some conventional binarization methods, a global binarization threshold is used; pixels with values greater than the threshold is set to 1 and pixels with values lower than this threshold is set to 0. There are various existing methods for determination of the binarization threshold. One example is Otsu's method. However, when these methods are applied on real document images, they are often affected by image quality as well as image content. For example, many document images have a background color and embedded image, for which Otsu's method does not work well. Furthermore, some binarization methods are slow.
Some existing methods employ local threshold binarization, with a separation between the foreground and the background. Many such methods can deal with images output from cameras which tend to have a low quality. However, they involve high cost of computation. Some existing methods employ adaptive local thresholding based on verification based multi-threshold probing scheme.