A variety of commonly encountered images are composed of mixed-mode content. In particular, many images have mixed textual and continuous tone content, such as where text appears over a background picture or photograph. Examples of multi-media applications where such images are common include computer screen capture (e.g., capturing images of the Microsoft Windows operating system's desktop or like computer displays, which can typically include icons with text labels over a background photograph), educational videos, and color facsimile, among others.
Digital images typically are compressed to reduce storage and transmission costs in computers and other consumer electronics and signal processing devices. Many image compression algorithms apply a block-based linear transform (e.g., the discrete cosine transform (DCT) used in the JPEG, MPEG and H.261 compression standards) with quantization of high-frequency transform coefficients to achieve lossy compression of image data. A drawback of this approach when applied to mixed-mode images is that the quantization of high frequency transform coefficients has the effect of distorting or blurring the textual content in a mixed-mode image, because the edges of text characters are discontinuous in color with respect to a background continuous tone picture and quantization tends to blur such locations of high color variation. The blurred edges of text characters can be readily perceptible to the viewer.
Various authors have proposed approaches to compressing mixed-mode images using text segmentation, including K. O. Perlmutter, N. Chaddha, J. B. Buckheit, R. M. Gray, and R. A. Olshen, “Text segmentation in mixed-mode images using classification trees and transform tree-structured vector quantization,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 2231–2234, 1996; N. Chaddha, “Segmentation-Assisted Compression Of Multimedia Documents,” in Conference Record of the Twenty-Ninth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1452–1456, 1996; N. Chaddha and A. Gupta, “Text Segmentation Using Linear Transforms,” in Conference Record of the Twenty-Ninth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1447–1451, 1996; and N. Chaddha, R. Sharma, A. Agrawal and A. Gupta, “Text Segmentation In Mixed-Mode Images,” in Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1356–1361, 1994. In general, the approaches described by these authors involve classifying a transform block within the image to be either a text segment or non-text based on characteristics of the block's transform coefficients (e.g., using the discrete cosine transform (DCT) or discrete wavelet transform (DWT)), and using different (higher quality) compression parameters (quantization matrices and entropy codes) for blocks classified as text than for non-text blocks.
A problem with these text segmentation approaches is that the block classified as containing text often is still composed of mixed mode content: text and a continuous tone background picture. The use of higher quality compression parameters for such blocks sacrifices compression of the continuous tone content of the block. On the other hand, the extent to which the compression of such blocks remains lossy can still lead to perceptible degradation in quality of the text content.