Documents often include not only text, but color, graphics and imagery. These are often referred to as compound documents. Magazines, newspapers, brochures and annual reports have had these attributes for a long time. With the popularity of desktop publishing, color scanners, color printers, color copiers and color digital cameras for the consumer and office markets, the ability to make use of color, graphics and imagery in documents is now commonplace.
There are various compressors for specific image types. These include fax compression technologies such as G3, G4, MMR, and JBIG. Other well-known compression technologies include JPEG.
Some compressors can handle portions of these document efficiently based on their data types. However, many of these compressors cannot handle compound documents well. For example, binary compressors, such as JBIG, provide excellent compression for text that can be characterized as binary. However, such compressors, including the traditional facsimile compression technologies (G3, G4, MMR, JBIG), are insufficient for color images or even grayscale. Similarly, continuous tone compressors, for example, JPEG, are better for natural images with little high frequency information yet wide dynamic range (see W. P. Pennebaker, J. L. Mitchell, JPEG: Still Image Compression Standard, Van Nostrand Reinhold, 1993). However, baseline JPEG does not provide a lossless representation and is not so efficient for sharp edges created by text.
Furthermore, none of these technologies discussed above allows access of lower resolution, progression from lossy to lossless, or access to regions-of-interest. This access is useful for delivering document images from databases or capture devices, to different target devices such as computer and PDA displays, and printers.
Part 1 of the JPEG 2000 standard (referred to as JPEG 2000) is a state-of-the-art continuous-tone image coding system. See “Information Technology—JPEG 2000 Image Coding Standard,” ITU-T Rec. T.800 | IS 15444-1, December 2000 and D. S. Taubman, M. W. Marcellin, JPEG 2000 Image Compression Fundamentals, Standards, and Practice, Kluwer Academic Publishers, Boston, 2002. Based on wavelet transform technology followed by bit-plane coding, JPEG 2000 generally provides better rate-distortion performance than the original discrete cosine transform based JPEG coding system. However, the real advantages of JPEG 2000 are the access of different resolutions, progressive bit-rates from very lossy to lossless, access to regions-of-interest, and access to color components. Although JPEG 2000 is capable of reasonable lossless performance on binary images, it is not as good as a dedicated binary image compressor like JBIG or JBIG-2.
Many have been working on the problem of determining how a page image should be segmented for the best rate-distortion. For example, see D. Mukherjee, C. Chrysafis, “JPEG 2000-Matched MRC Compression of Compound Documents” Proc. Int. Conf on Image Processing, Rochester, N.Y., Sept. 2002; R. L. de Queiroz, Z. Fan, T. D. Tran, “Optimizing Block-Thresholding Segmentation for Multilayer Compression of Compound Images,” IEEE Trans. on Image Processing, Vol. 9, No. 9, pp. 1461-71, September. 2000; and L. Bottou, P. Haffner, Y. LeCun, “Efficient Conversion of Digital Documents to Multilayer Raster Formats,” Int. Conf Doc. Analysis and Recognition, Seattle, Wash., pp. 444-48, September. 2001.
JPM is a new standard file format that has been designed to address these problems. The JPM file format (PEG Mixed Raster Content) is Part 6 of the JPEG 2000 standard. See “Information Technology—JPEG 2000 Image Coding Standard—Part 6: Compound Image File Format,” ISO/IEC FDIS 15444-6. The JPM standard is a file format that specifies multiple page collections and pages, multiple objects with object images, mask (binary or alpha), location, scale, and order, and background color. FIG. 1A shows an example of some of the elements of a JPM file, including the merging of JPM objects with image and mask elements.
JPM enables the segmentation of document images into images that are better compressed by different standard image compressors. For example, text and graphic images with high frequency information but little dynamic range are best compressed with a binary coder such as Group 4 (“Facsimile coding schemes and coding control functions for group 4 facsimile apparatus,” ITU-T Rec. T.6, November 1998), or JBIG (“Information Technology—Coded representation of picture and audio information—Progressive bi-level image compression,” ITU-T Rec. T.82, March 1995), or JBIG-2 (“Information Technology—Lossy/Lossless coding of bi-level images,” ITU-T Rec. T.88, February 2000).
JPM has three key features. First, it allows JPEG 2000 coder use. Second, it allows multiple pages and collections of pages to be contained, or referenced, in a single file. Third, it enables a compressed masked imaging system. The JPM file contains “layout objects”, which contain a “mask,” “image,” and attributes such as order (with respect to other objects), scale, position, and cropping (or extent). These layout objects are merged together to form the final “page” image. FIG. 1B shows storing multiple pages of JPM objects with image and mask elements.
JPM is considered by some to be a descendent from the Mixed Raster Content file format often used for Internet-based facsimile. See “Mixed Raster Content (MRC),” ITU-T Rec. T.44, Study Group-8 Contributions, 1998. This standard was used in the IETF facsimile standard (“File Format for Internet Fax,” IETF RFC 2301, http://www.ietf.org/rfc/rfc2301.txt, March 1998) and Xerox's Digipaper product (see D. Huttenlocher, P. Felzenszwalb, W. Ruckidge, “Digipaper: A Versatile Color Document Image Representation,” Proc. Int. Conf. on Image Processing, Kobe, Japan, October 1999).
Another related technology that preceded JPM is DjVu (see L. Bottou, et. al., “High Quality Document Image Compression with DjVu,” J. Electronic Imaging, pp. 410-25, July 1998). This technology is similar to, but not compliant with, Mixed Raster Content. However, it does take advantage of wavelet technology for continuous-tone coding. Another related technology is Scalable Vector Graphics (SVG) standardized by the W3C (see http://www.w3.org/TR/SVG). This technology provides multiple resolutions for objects, but limited options for raster content.