The present invention relates to digital images. More specifically, the present invention relates to compression of compound documents.
A hardcopy device such as a digital copier typically includes a scan engine for converting pages of documents into digital images, memory for storing the digital images, and a print engine for printing out the stored images. The memory is usually large enough to store at least one full digital image. By storing a full digital image in memory, the digital copier can print out multiple copies of the stored image.
Digital image compression is performed to reduce the memory and bandwidth requirements of the digital copier. Reducing the memory and bandwidth requirements, in turn, reduces the cost of the digital copier.
A single compression algorithm is usually not suitable for compressing compound documents. Compound documents may contain text, drawings and photo regions (sometimes overlaid), complex backgrounds (e.g., text boxes), watermarks and gradients. For example, magazines, journals and textbooks usually contain two or more of these features. Compression algorithms such as JPEG are suitable for compressing photo regions of the compound color documents, but they are not suitable for compressing black and white text regions of the compound color documents. These lossy compression algorithms are based on linear transforms (e.g., discrete cosine transform, discrete wavelet transform) and do not compress edges efficiently. They require too many bits, and may produce very objectionable artifacts around text.
Compression algorithms such as CCITT, G4 and JBIG are suitable for compressing black and white text regions of the compound color documents. However, they are not suitable for compressing photo regions of the compound color documents.
A typical solution is to pre-process the documents, separating the regions according to the type of information they contain. For instance, regions containing edges (e.g., regions containing text, line-art, graphics) and regions containing natural features (e.g., regions containing photos, color backgrounds and gradients) are separated and compressed according to different algorithms.
However, algorithms for separating the regions tend to be very complex, requiring large amounts of memory and high bandwidth. The complexity, high bandwidth and large memory requirements make many algorithms unsuitable for embedded applications such as digital copiers, printers, scanners and other hardcopy devices.
According to one aspect of the present invention, a digital image (e.g., a digital image of a compound document) is processed by accessing a plurality of blocks of the digital image; and classifying each block for compression. At least some blocks are classified according to their number of distinct colors. Block size is substantially smaller than size of the digital image; therefore, bandwidth and memory requirements are reduced.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the present invention.