1. Field of the Invention
This application relates in general to image processing and in particular to identifying regions of black text, image, and white space in input images.
2. Discussion of Related Art
Embodiments of the present invention provide a solution for improving image segmentation, which is the process of categorizing the pixels of an input document as white space, black text, and image content. Segmentation can improve document image processing in several ways, essentially because the choice of appropriate image processing algorithm for a particular image region depends on the region's contents. Application of particular algorithms to document image regions containing the wrong content type can actually decrease document image quality.
When performing color copying for example, one can get much higher quality results if one can determine what parts of the original document are white space, black text or black line art, and image content. Text looks better when printed only in black ink or black toner, as there are no colored halos around the edges. In contrast, printed photographs generally look better when all available inks or toner colors are used. There are also performance advantages to printing according to particular algorithms, i.e. if a page or part of a page is black text only, it can often be printed faster. Scanned input may also be compressed more effectively if regions of black text and white space have already been recognized. This application is written in terms of a copy system, though the present invention is not limited to that use.
Segmentation is unfortunately not an easy task. The input document image arriving for processing from the scanner is typically noisy. Scanned text often has color fringes or halos, and may not be very dark. Text also comes in many different fonts and sizes. White areas may have nonwhite speckles from a variety of causes. Images may be screened halftones, photographs, or colored artwork for example, and may need to be further processed accordingly.
At present, typical segmentation tools break the input document image into rectangular blocks, and calculate various parameters for each block. The parameters might include smoothness, color level, average luminance, minimum luminance, and maximum luminance. Some of these parameters may be averaged vertically or in a neighborhood of the block. These aggregated values are additional parameters for the block.
Then a classification by parameter values takes place, which often involves comparing parameter values to thresholds and doing lots of ANDs and ORs to try to assemble a logic function to refine the classification. A secondary algorithm run over the initial classification converts islands of text in an image to image content, and vice versa, and performs similar cleanups. Tuning the thresholds is a time consuming and frustrating job. A set of threshold values is typically chosen by educated guess.
A set of original documents is then run through a segmentation algorithm. A person, the tuner, examines the results of the segmentation algorithm carefully to see where mistakes were made. The tuner then manually determines what threshold values or what logical combination caused the error, a difficult task in itself, and adjusts the threshold or fixes the logic. The tuner then repeats the process. An adjustment that fixes one problem will often create others, unfortunately. The tuner must try to find threshold values and logic that minimize errors over a wide range of documents.
Previous manual and partially-automated segmentation techniques are tedious, time-consuming, and/or inaccurate. There is therefore a need for improved automated document segmentation.