Documents often include not only text, but color graphics and imagery. These are often referred to as compound documents. Magazines, newspapers, brochures and annual reports have had these attributes for a long time. With the popularity of desktop publishing, color scanners, color printers, color copiers and color digital cameras for the consumer and office markets, the ability to make use of color, graphics and imagery in documents is now commonplace.
There are various compressors for specific image types. These include fax compression technologies such as G3, G4, MMR, and JBIG. Other well-known compression technologies include JPEG.
Some compressors can handle portions of these document efficiently based on their data types. However, many of these compressors cannot handle compound documents well. For example, binary compressors, such as JBIG, provide excellent compression for text that can be characterized as binary. However, such compressors, including the traditional facsimile compression technologies (G3, G4, MMR, JBIG), are insufficient for color images or even grayscale. Similarly, continuous tone compressors like, for example, JPEG, are better natural images with little high frequency information yet wide dynamic range JPEG (see W. P. Pennebaker, J. L. Mitchell, JPEG: Still Image Compression Standard, Van Nostrand Reinhold, 1993) or JPEG 2000 (see “Information Technology—JPEG 2000 Image Coding Standard,” ITU-T Rec. T.800|IS 15444-1, December 2000 and D. S. Taubman, M. W. Marcellin, JPEG 2000 Image Compression Fundamentals, Standards, and Practice, Kluwer Academic Publishers, Boston, 2002). However, JPEG does not provide a lossless representation and is not so efficient for sharp edges created by text.
Furthermore, none of these technologies discussed above allows access of lower resolution, progression from lossy to lossless, or access to regions-of-interest. This access is useful for delivering document images from databases or capture devices, to different target devices such as computer and PDA displays, and printers.
JPEG 2000 is a state-of-the-art continuous-tone image coding system. Based on wavelet transform technology followed by bit-plane coding, JPEG 2000 generally provides better rate-distortion performance than the original discrete cosine transform based JPEG coding system. However, the real advantages of JPEG 2000 are the access of different resolutions, progressive bit-rates from very lossy to lossless, access to regions-of-interest, and access to color components. Although JPEG 2000 is capable of reasonable lossless performance on binary images, it is not as good as a dedicated binary image compressor like JBIG or JBIG-2.
Many have been working on the problem of determining how a page image should be segmented for the best rate-distortion. For example, see D. Mukherjee, C. Chrysafis, “JPEG 2000-Matched MRC Compression of Compound Documents” Proc. Int. Conf on Image Processing, Rochester, N.Y., September 2002; R. L. de Queiroz, Z. Fan, T. D. Tran, “Optimizing Block-Thresholding Segmentation for Multilayer Compression of Compound Images,” IEEE Trans. on Image Processing, Vol. 9, No. 9, pp. 1461-71, September 2000; and L. Bottou, P. Haffner, Y. LeCun, “Efficient Conversion of Digital Documents to Multilayer Raster Formats,” Int. Conf Doc. Analysis and Recognition, Seattle, Wash., pp. 444-48, September 2001.
JPM is a new standard file format that has been designed to address these problems. The JPM file format (JPEG Mixed Raster Content) is Part 6 of the JPEG 2000 standard. See “Information Technology—JPEG 2000 Image Coding Standard—Part 6: Compound Image File Format,” ISO/IEC FDIS 15444-6. The JPM standard is a file format that specifies multiple page collections and pages, multiple objects with object images, mask (binary or alpha), location, scale, and order, and background color. The algorithms and implementations for the encoder include how to segment the image into objects, which of a variety of image compressors to use for each object, while how to construct the objects is left unspecified for the encoder. FIG. 1A shows an example of some of the elements of a JPM file.
JPM enables the segmentation of document images into images that are better compressed by different standard image compressors. For example, text and graphic images with high frequency information but little dynamic range are best compressed with a binary coder such as Group 4 (“Facsimile coding schemes and coding control functions for group 4 facsimile apparatus,” ITU-T Rec. T.6, November 1998), or JBIG (“Information Technology—Coded representation of picture and audio information—Progressive bi-level image compression,” ITU-T Rec. T.82, March 1995), or JBIG-2 (“Information Technology—Lossy/Lossless coding of bi-level images,” ITU-T Rec. T.88, February 2000).
JPM has three key advantages. First, it allows JPEG 2000 coder use. Second, it allows multiple pages and collections of pages to be contained, or referenced, in a single file. Third, it enables a compressed masked imaging system. Called “layout objects”, they contain a “mask,” “image,” and attributes such as order (with respect to other objects), scale, position, and cropping (or extent). These layout objects are merged together to form the final “page” image. FIG. 1B shows an example of the merging of JPM objects with image and mask elements.
JPM is considered by some to be a descendent from the Mixed Raster Content file format often used for Internet-based facsimile. See “Mixed Raster Content (MRC),” ITU-T Rec. T.44, Study Group-8 Contributions, 1998. Mixed Raster Content was standardized as ITU-T Rec. T.44 (“Mixed Raster Content (MRC),” ITU-T Rec. T.44, Study Group-8 Contributions, 1998). This standard was used in the IETF facsimile standard (“File Format for Internet Fax,” IETF FRC 2301, March 1998) and Xerox's Digipaper product (see D. Huttenlocher, P. Felzenszwalb, W. Ruckidge, “Digipaper: A Versatile Color Document Image Representation,” Proc. Int. Conf. on Image Processing, Kobe, Japan, October 1999).
Another related technology that preceded JPM is DjVu (see L. Bottou, et. al., “High Quality Document Image Compression with DjVu,” J. Electronic Imaging, pp. 410-25, July 1998). This technology is similar to, but not compliant with, Mixed Raster Content. However, it does take advantage of wavelet technology for continuous-tone coding. Another related technology is Scalable Vector Graphics (SVG) standardized by the W3C. This technology provides multiple resolutions for objects, but limited options for raster content.