The present invention is directed to a method for image segmentation to produce a mixed raster content (MRC) image with constant foreground layers. The method extracts uniform text and other uniform color objects that carry detail information. The method includes four primary steps. First, the objects are extracted from the image. Next, the objects are tested for color consistency and other features to decide if they should be chosen for coding to the MRC foreground layers. The objects that are chosen are then clustered in color space. The image is finally segmented such that each foreground layer codes the objects from the same color cluster.
Heretofore, a number of patents and publications have disclosed aspects of image segmentation. The following patent and publication are hereby incorporated by reference in their entirety, and the relevant portions of which are briefly summarized as follows:
U.S. Pat. No. 5,767,978 to S. Revankar and Z. Fan, for an “IMAGE SEGMENTATION SYSTEM,” issued Jun. 16, 1998, discloses an image rendering system for processing a stream of data in a document processing system, the stream of data including segmentable imaging data for rendering an output image, and the output image capable of being differentially rendered according to a plurality of image classes. The image rendering system includes: a segmentor for dividing the imaging data into a plurality of image regions; a selector for assigning the regions to each image class; and a processor, responsive to the selector, for differentially rendering the output image according to at least one of the plurality of image classes.
In “Background Identification Based Segmentation and Multilayer Tree Based Representation of Document Images,” Proceedings of IEEE International Conference on Image Processing, ICIP Rochester, N.Y. September 2003, H. Cheng and Z. Fan teach a three-layer segmentation of objects within an image. The segmentation algorithm (BISeg), locates and classifies objects in an image, identifying main and local backgrounds.
MRC (Mixed Raster Content) is a powerful image representation concept in achieving high compression ratios while maintaining high reconstructed image quality. MRC has also been established as a compression standard. Within MRC, a basic three-layer model (contone foreground, contone background, and binary mask) is the most common representation form. It represents a color raster image using a background layer and a mask and foreground layer pair. The foreground and background layers are normally contone bitmaps, while the mask is usually binary. The Mask layer describes how to reconstruct the final images from the other two layers. When the Mask layer pixel value is 1, the corresponding pixel from the Foreground layer is selected for the final image; when it is 0, the corresponding pixel from the Background layer is selected.
However, MRC has the disadvantage that the resulting files, when coded in PDF, may not be able to be printed on some Postscript and PDF printers. This problem can be avoided if the foreground layer is not represented in contone form. As a result, MRC with constant foreground layers has been introduced to deal with the problem. This model contains one background layer, N foreground layers and N mask layers, where N is a non-negative integer. While the background layer can be a contone bitmap, the foreground layers are restricted to be constant colors. Although constructing this model is computationally more difficult than constructing a three-layer model, the resulting PDF file appears to be printable by all Postscript printers.
In accordance with the present invention, there is provided a method for the segmentation of a digital image for representation in a mixed raster content form with a constant foreground, comprising the steps of: extracting uniform color objects from the image; testing at least some of the extracted objects for color consistency to decide if the extracted objects should be coded to a foreground layer in the mixed raster content form; clustering, in color space, objects that are chosen for representation the foreground layer to associate objects in at least one common color cluster; and segmenting the image such that each foreground layer represents objects from the common color cluster
One aspect of the invention is based on the discovery that an MRC image format may be used in a manner such that an image is identified only with a constant or common foreground color, rather than in a more traditional three-layer MRC format. This discovery avoids problems that arise in using three-level MRC formats on certain printers that are incapable of processing the format. Using the techniques set forth herein, the present invention is able to produce a representation of an image in a modified (constant foreground) format that is printable on a wider ranger of printing devices. Accordingly, the present invention enables the use of MRC formats, but does so in a manner that enables the use of installed printers to render the image. As a result of the techniques employed in accordance with the present invention, existing Postscript and PDF printing devices may continue to be employed to render MRC formatted image files.