Document or image compression is collectively referred to herein as “document compression” or simply, “compression”. Document compression addresses the problem of reducing the amount of data required to represent a digital content of a given document or image. The underlying principle of the reduction process in document compression is the removal of redundant data.
Compression techniques generally fall into two broad categories—lossless and lossy. A lossless compression preserves the information in that it allows the data of the document or image to be compressed and decompressed without the loss of information. While the information reproductions from a lossless compression results in the original information, in many circumstances, lossless compression provides little or no reduction in the data size. On the other hand, lossy compression often provides comparatively higher levels of data reduction but result in a less than perfect reproduction of the original information.
For example, lossless compression methods such as Lempel-Ziv (LZ) do not perform particularly well on scanned images and achieve little to no size reduction from the compression. While lossy compression methods, such as Joint Photographic Experts Group (JPEG) compression, work fairly well on continuous-tone pixel maps in reducing their size, they do not work particularly well on the parts of the page containing text, as the clarity of the text is lost in the reproduction due to data loss during compression.