1. Field of the Invention
This invention relates to document image processing, and in particular, it relates to separation (segmentation) of foreground text and background graphics or image.
2. Description of Related Art
Some document images contains both foreground (typically, text) and background (typically, graphics or image) content. Examples includes a PowerPoint document that has a “theme” graphic as background and text as foreground, a table or spreadsheet with shaded table cells, a check with a background image, etc. Sometimes, background may result from undesirable artifacts during image acquisition, such as uneven lighting condition when a document image is generated by photographing a hardcopy document. Typically, the background graphics or image is slow-varying as compared to the foreground text. Color document images can have various complex foreground and background conditions. For various purposes, such as document binarization, OCR (optical character recognition), printing, etc., it is often desirable to automatically separate the foreground text from the background image of graphics.
Existing methods for color document image binarization usually convert a color image into grayscale and then apply certain global or local (adaptive) thresholding to obtain a binary output, with the goal of excluding the background image or graphics from the binarized document.