Image data is often stored in the form of multiple scanlines, each scanline comprising multiple pixels. When processing this type of image data, it is helpful to know the type of image represented by the data. For instance, the image data could represent graphics, text, a halftone, condone, or some other recognized image type. A page of image data could be all one type, or some combination of image types.
It is known in the art to take a page of image data and to separate, or “segment,” the image data into windows of similar image types. For instance, a page of image data may include a halftone picture with accompanying text describing the picture. In order to efficiently process the image data, it is desirable to segment the pictorial area from text area. Processing of the page of image data can then be efficiently carried out by tailoring the processing to the type of image data being processed based on the segmentation result.
One common overall method for performing image segmentation is the use of a “mixed-raster content” or MRC representation of image data. There are several variations of MRC representation, as shown for example in FIG. 1. The representation typically comprises three independent planes: foreground (FG), background (BG), and a selector (SEL) plane. The background plane is typically used for storing continuous-tone information such as pictures and/or smoothly varying background colors. The selector plane normally holds the image of text (binary) as well as other edge information (e.g., line art drawings). The foreground plane usually holds the color of the corresponding text and/or line art. The content of each of the planes may be defined appropriately by an implementation of the MRC representation.
FIG. 2A is an example separation-plane image before MRC processing, and FIG. 2B is an example initial segmentation result from FIG. 2A. FIG. 2B shows how certain types of original images yield complicated selector plane images. In the FIG. 2B illustration, ideally all selector plane pixels should have the same value and thus be considered part of the same type of image data, because all of the pixels are part of the same photograph. However, the initial segmentation of this single image area has much error, which is caused by relatively large, uniform light or dark areas within the photograph: a standard segmentation algorithm will erroneously recognize those portions of the photograph, shown in black in FIG. 2B, as belonging to another type of image, such as a uniform color area. The mischaracterization of image types can lead to the subsequent employment different processing methods (some lossless, others lossy) for the same photograph region.
In segmentation of MRC image data to yield a selector plane, as well as in other activities with any kind of image data, a kind of error of segmentation is called the “hole” problem. A “hole” in an initial segmentation result (such as a selector plane in the three layer MRC case) can be defined as a small area associated with a first subset or type of image data surrounded by a greater “island” of pixels associated with a second subset or type of image data, the island in turn being substantially surrounded by a greater area associated with the first subset. As will be described in detail below, the presence of such holes in image data, such as in an MRC image plane, can lead to special problems of misclassification of portions of the image data.