JPM, JPEG 2000 Part 6, is a Mixed Raster Content (MRC) file format. MRC is a method for compressing compound images containing both binary text and continuous-tone images. MRC uses a multi-layered imaging model for representing the results of multiple compression algorithms, including ones developed specifically for text and for images. In JPM, images are compressed by decomposing them into layers that are composited with a mask. Images after decomposition and masks are compressed with appropriate standardized image compressors such as JPEG, JPEG 2000, JBIG, MMR. For more information on JPM and the MRC file format, see “Information Technology—JPEG 2000 Image Coding Standard—Part 6: Compound Image File Format,” ISO/IEC FDIS 15444-6; Boliek & Wu, “JPEG 2000-like Access Using the JPM Compound Document File Format,” ICME 2003 Proceedings, 2003 International Conference on Multimedia and Expo, Vol. 1, 6-9 Jul. 2003; de Queiroz, Buckley & Xu, “Mixed Raster Content (MRC) Model for Compound Image Compression,” Proc. IS&T/SPIE Symp. on Electronic Imaging, Visual Communications and Image Processing, San Jose, Calif., SPIE Vol. 3653, pp. 1106-1117, February 1999.
A JPM file may have any number of composited layers. A JPM encoder creates the data that is stored according to the JPM file format. Such an encoder typically includes a continuous-tone compressor for compressing continuous tone images corresponding to background and foreground images and a binary image compressor for compressing masks.
In the prior art, binarization of images is performed as a pre-processing operation for optical character recognition (OCR). For example, see “Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difficult Document Images, ICDAR 2003: Proceedings of International Conference on Document Analysis and Recognition, Vol. 2, pp. 859-863, 2003, Leedham et al.
Two important functions of the JPM encoder are not standardized. These include segmentation and data filling. Segmentation refers to how to decompose the image into foreground and background layers and create the mask, while data filling indicates how to assign values to “don't care” pixels in the foreground and background images (pixels of a layer that are not used when compositing the final image using the mask) so that the foreground and background images compress well.
A number of segmentation and data filling techniques exist in the prior art. For example, many of the prior art segmentation methods use K-means clustering in the spatial domain for image processing, with many of the prior art methods being block based. K-means clustering uses the number of times each color occurs, but not the spatial relationship of the colors. Page segmentation has been done in scale space based on a morphological transform. For example, see D. P. Mukherjee and S. T. Acton, “Document page segmentation using multiscale clustering,” Proc. IEEE Int. Conf. On Image Processing, Kobe, Japan, Oct. 25-29, 1999.
U.S. Pat. No. 6,633,670 discloses foreground/background segmentation on blocks of two different sizes and uses values from the maximum gradient for clustering instead of iterative k-means.
The prior art includes multi-scale methods for segmentation that do not involve multi-resolution transforms such as, for example, using different block sizes for grouping pixels and computing probabilities of using Markov chains across scales. For example, in Zhao Jian, et al., “A New Wavelet-Based Document Image Segmentation Scheme,” Journal of Systems Engineering and Electronics, vol. 13, no. 3, 2002, pp. 86-90, multiple block sizes are used, and a single transform level of the critically sampled (not redundant) Haar transform is used.
Adaptive filtering, diffusion based on edges and grayscale morphology is well known. There are non-MRC compression methods for compound documents. For example, Guo et. al., “Compound Image Compression Using Adaptive Wavelet Transform,” Journal of the Institute of Image Electronics Engineers of Japan, vol. 30, no. 2, pp. 138-50, discloses the use of an adaptive wavelet transform (5,3 wavelet and lifting based transform for binary data) based on a segmentation.
DjVu refers to a product developed in conjunction between AT&T and Lizard Tech. In DjVu, foreground/background segmentation is performed using a weighted average of K-means clustering on blocks of varying sizes. Filling is performed by iteration using the compression (critically sampled, longer filter than Haar) wavelet system. For more information on DjVu, see Bottou, et al., “High Quality Document Image Compression with DjVu,” Journal of Electronic Imaging, vol. 7, no. 3, pp. 410-425, SPIE, 1998.
Some of the prior art methods in these areas are designed to capture fine scale detail such as halftone-noise. Xerox has used MRC. For example, for images of colored engravings, a combination of K-means segmentation and adaptive region-based thresholding is followed by artifact cleaning. For more information, see Misic, Buckley & Parker, “Encoding and Processing of Color Engravings (Using MRC),” 2002 IEEE International Conference on Image Processing, Rochester, N.Y., Sep. 22-25, 2002. One disclosed method is a multiple step method with an initial mask and a final mask, with a goal to keep the structure of the printing process (such as halftoning noise) in the mask, and not reject it.
In one prior art reference, a block-based segmentation method is disclosed in which a threshold value is chosen for each block to separate the background and the foreground. For more information, see de Queiroz, Fan & Tran, “Optimizing block-thresholding segmentation for MRC compression,” Proc. IEEE Intl. Conf. on Image Processing, ICIP, Vancouver, Canada, Vol. II, pp. 597-600, September 2000. For grayscale and N×N blocks, there are at most N2 possible values for the threshold. For 8×8 blocks, such as those used by JPEG, searching at most 64 threshold values can be done to find the best threshold. See, for example, U.S. Pat. No. 6,373,981. Furthermore, U.S. Pat. No. 6,400,844 discloses the use a simple threshold on blocks for foreground/background segmentation and has other classification methods to decide to do foreground/background segmentation on a block or not.
There are several fill methods that have been employed in the prior art. For example, one such fill technique includes performing a fill by averaging non-don't-care pixels that form a four-neighborhood in multiple passes until all don't cares are filled. See de Queiroz, “On data-filling algorithms for MRC layers,” Proc. IEEE Intl. Conf. on Image Processing, ICIP, Vancouver, Canada, Vol. II, pp. 586-589, September 2000. Another prior art fill method includes performing a fill by using a low pass filter and exaggerating the coefficients for positions that are not “don't care.” Still another prior art fill method includes fill by iteration using a DCT. There are several related U.S. Patents. For example, see U.S. Pat. Nos. 6,334,001, 6,272,255, and 6,275,620.
A segmentation and fill method is disclosed in Mukherjee et al., “JPEG2000-Matched MRC Compression of Compound Documents,” 2002 IEEE International Conference on Image Processing, Rochester, N.Y., Sep. 22-25, 2002. In the disclosed method, processing is performed in stripes, with segmentation being done on blocks and occurring in one pass. High contrast blocks are segmented by intensity into background (lighter) and (darker) foreground with K-means. Low contrast blocks are assigned based on the neighboring blocks.
LuraDocument is a JPM related product that performs foreground/background segmentation and fill from LuraTech (http://www.luratech.com/, Algo Vision LuraTech GmbH), which is similar to JPM but not standardized, is described in EPO patent no. EP 1 104 916 A1 (in German); Thierschmann et al., “A Scalable DSPArchitecture For High-Speed Color Document Compression,” Document Recognition and Retrieval VIII; and Kantor, et al., Editors, Proceedings of SPIE Vol. 4307 (2001), San Jose, Calif., January 2001. LuraDocument creates an adaptive threshold for binarization. On a 75 dpi reduced resolution, 3×3 minimum and 3×3 maximum filters are applied and their difference is computed. If the difference is less than a fixed parameter, the threshold is initially “don't care;” otherwise, it is initially the average of the minimum and maximum. The initial threshold values that are not “don't care” are filtered with a 3×3 averaging filter and then further propagated with a 5×5 averaging filter. “Don't care” threshold values are assigned using a 7×7 filter that averages all values that are not “don't care”. The threshold values are interpolated to the full image resolution with bilinear interpolation. The threshold values are then compared with the original image to generate a binary image.
LuraDocument does text detection by finding connected components in the binary image of sizes within limits based on the document resolution. For each connected component, it is classified as text if its internal variance is below a threshold and has strong edges based on Sobel and Laplace filters. Text connected components are set as foreground in the mask, and other locations are set as background.
LuraDocument generates a foreground image that is reduced in resolution by a factor of 3 in each dimension. The mask is thinned by one pixel to select non-border foreground pixels versus “don't care” pixels. For each 3×3 block, pixels that are not “don't care” are averaged and 3×3 blocks with 9 “don't care” pixels are labeled as “don't care”. A 5×5 averaging filter is used to propagate average values to “don't care” values. Another 5×5 averaging filter is used to damp the foreground color in non-text regions towards gray. The background is computed in a similar fashion.
Fill and segmentation operations have been used in check image compression. For example, a complete system for checks (using in banking) is described in Huang, et al., “Check image compression using a layered coding method,” Journal of Electronic Imaging, 7(3), pp. 426-442, July 1998, in which the system uses grayscale morphological closing as the key operation for segmentation. The system determines the size (scale/resolution) for the structuring element independently for small blocks, performs removal of isolated pixels, and performs fill using averaging.