1. Field of the Invention
The present invention relates to an apparatus and method of segmenting an image and/or receiving the segmented image in an image coding and/or decoding system, and more particularly, to an apparatus and method of dividing an image into blocks, defining the respective blocks using cost optimized segmentation and connected component classification to generate a segmentation image, and receiving a signal representing the segmentation image in a mixed raster content based coding and/or decoding system.
2. Description of the Related Art
Mixed raster content (MRC), defined in ITU-T T.44, is a standard for efficient document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. MRC represents an image as a set of layers. In the most basic mode of MRC, a compound document with text and pictures is separated into three layers: a binary mask layer, a foreground layer and a background layer. The binary mask layer indicates the assignment of foreground as “1”, or background as “0” to each pixel. According to ITU-T T.44, it is recommended that text and line art be classified to the foreground layer, and pictures classified to the background.
The procedure to create the binary mask layer is called segmentation. After the segmentation, each layer may be compressed by an appropriate encoder to create an MRC document. For example, the foreground and background layers may be encoded using JPEG or JPEG2000, while the binary mask layer may be encoded using JBIG or JBIG2.
The segmentation is a process of MRC encoding to differentiate text and graphics regions within an image and creates the binary mask layer described above. Typically, the foreground layer contains the colors of text, the background layer contains images and graphics, and the binary mask layer is used to represent the fine detail of text fonts. The quality of the decoded image is heavily dependent on the segmentation algorithm because the binary mask layer defines the shape of characters, and because incorrect segmentation can cause distortion in the decoded image.
Although the segmentation is a critical step in the MRC encoding, the standard ITU-T T.44 does not define a segmentation method. Instead, the standard ITU-T T.44 only defines a structure of an MRC document decoder, so any segmentation algorithm may be independently optimized for best performance.
There are a variety of desirable attributes for segmentations used in the document compression. For the purposes of illustration, binary segmentation is explained, but multi-layer segmentations can also be applicable to the document compression. The attributes may be more or less important depending on requirements of the application. The desirable attributes are listed below.
One of the attributes is segmentation edges along text and graphics boundaries—A good segmentation will contain transitions at the locations of text and graphics edges. The edge in the segmentation allows for accurate and high resolution encoding of text edges even when the foreground and background layers are coded at low resolution and low quality, as is desirable to reduce the total bits per pixel for the encoded document.
Another one of the attributes is spatially smooth segmentation—the segmentation to be spatially smooth for two reasons. First, a smooth segmentation can be encoded more efficiently by a binary image encoder, thereby reducing the total bits per pixel in the encoded document. Second, the spurious edges in the segmentation can cause defects in the final decoded document because of inconsistencies between the foreground and background images at the locations where they are seamed together.
Another one of the attributes is image regions reliably classified to the background layer—It is useful to consistently have image regions classified to the background layer since the sub-sampling, data-filling, and coding of the background layer is often optimized for compression of natural images.
Another one of the attributes is text regions reliably classified to the foreground layer—It is useful to consistently have text regions classified to the foreground layer since the sub-sampling, data-filling, and coding of the foreground layer is often optimized for compression of the text-font fill colors.
Another one of the attributes is accurate representation of textual and/or graphic content. In some applications, the segmentation layer is used to analyze the document's content. In these cases, it is useful that the segmentation accurately represent the textual and/or graphic content of the document.
For many MRC applications, it is important that the segmentation contain only text in the foreground plane (i.e., mask pixels which are labeled as “1”), and all other regions of the document should be in the background plane (i.e., mask pixels which are labeled as “0”) since in some applications, proper labeling of text and only text as foreground both improves the quality of the decoded document and reduces the bit rate (i.e., the number of bits per pixel of the encoded document). Unfortunately, conventional segmentation methods have made errors. These errors can take two forms. For example, text may be erroneously segmented as background, and background may be erroneously segmented as foreground.
FIG. 1 is a view illustrating errors in a binary mask in a convention image coding apparatus, as an example of misclassifications of the binary mask. Black regions indicate a label of “1” and white regions indicate a label of “0”. In this example, most of the text regions are properly segmented as foreground but some edges embedded in the picture regions are also segmented as foreground because of overly sensitive edge detection. Notice that the foreground portion of the segmentation can be described by a set of connected components, where each connected component represents a set of adjacent pixels in the mask that are all labeled as foreground (i.e., “1”). Using this property, an approach to reduce errors in the binary mask is to eliminate the connected components which are erroneously classified as foreground.