The present invention relates to digital images. More specifically, the present invention relates to compression of compound documents.
A hardcopy device such as a digital copier typically includes a scan engine for converting pages of documents into digital images, memory for storing the digital images, and a print engine for printing out the stored images. The memory is usually large enough to store at least one full digital image. By storing a full digital image in memory, the digital copier can print out multiple copies of the stored image.
Digital image compression is performed to reduce the memory and bandwidth requirements of the digital copier. Reducing the memory and bandwidth requirements, in turn, reduces the cost of the digital copier.
A single compression algorithm is usually not suitable for compressing compound documents. Compound documents may contain text, drawings and photo regions (sometimes overlaid), complex backgrounds (e.g., text boxes), watermarks and gradients. For example, magazines, journals and textbooks usually contain two or more of these features. Compression algorithms such as JPEG are suitable for compressing photo regions of the compound color documents, but they are not suitable for compressing black and white text regions of the compound color documents. These lossy compression algorithms are based on linear transforms (e.g., discrete cosine transform, discrete wavelet transform) and do not compress edges efficiently. They require too many bits, and may produce very objectionable artifacts around text.
Compression algorithms such as CCITT, G4 and JBIG are suitable for compressing black and white text regions of the compound color documents. However, they are not suitable for compressing photo regions of the compound color documents.
A typical solution is to pre-process the documents, separating the regions according to the type of information they contain. For instance, regions containing edges (e.g., regions containing text, line-art, graphics) and regions containing natural features (e.g., regions containing photos, color backgrounds and gradients) are separated and compressed according to different algorithms.
Knowledge of a predominant color in a digital image can lead to more efficient compression of the digital image.