The amount of information available via computers has dramatically increased with the wide spread proliferation of computer networks, the Internet and digital storage means. With such an increased amount of information has come the need to transmit information quickly and to store the information efficiently. Data compression is a technology that facilitates effectively transmitting and storing of information
Data compression reduces an amount of space necessary to represent information, and can be used for many information types. The demand for compression of digital information, including images, text, audio and video has been ever increasing. Typically, data compression is used with standard computer systems; however, other technologies make use of data compression, such as but not limited to digital and satellite television as well as cellular/digital phones.
As the demand for handling, transmitting and processing large amounts of information increases, the demand for compression of such data increases as well. Although storage device capacity has increased significantly, the demand for information has outpaced capacity advancements. For example, an uncompressed image can require 5 megabytes of space whereas the same image can be compressed and require only 2.5 megabytes of space without loss of information. Thus, data compression facilitates transferring larger amounts of information. Even with the increase of transmission rates, such as broadband, DSL, cable modem Internet and the like, transmission limits are easily reached with uncompressed information. For example, transmission of an uncompressed image over a DSL line can take ten minutes. However, the same image can be transmitted in about one minute when compressed thus providing a ten-fold gain in data throughput.
In general, there are two types of compression, lossless and lossy. Lossless compression allows exact original data to be recovered after compression, while lossy compression allows for data recovered after compression to differ from the original data. A tradeoff exists between the two compression modes in that lossy compression provides for a better compression ratio than lossless compression because some degree of data integrity compromise is tolerated. Lossless compression may be used, for example, when compressing critical text, because failure to reconstruct exactly the data can dramatically affect quality and readability of the text. Lossy compression can be used with images or non-critical text where a certain amount of distortion or noise is either acceptable or imperceptible to human senses. Data compression is especially applicable to digital representations of documents (digital documents). Typically, digital documents include text, images and/or text and images. In addition to using less storage space for current digital data, compact storage without significant degradation of quality would encourage digitization of current hardcopies of documents making paperless offices more feasible. Striving toward such paperless offices is a goal for many businesses because paperless offices provide benefits, such as allowing easy access to information, reducing environmental costs, reducing storage costs and the like. Furthermore, decreasing file sizes of digital documents through compression permits more efficient use of Internet bandwidth, thus allowing for faster transmission of more information and a reduction of network congestion. Reducing required storage for information, movement toward efficient paperless offices, and increasing Internet bandwidth efficiency are just some of many significant benefits associated with compression technology.
Compression of digital documents should satisfy certain goals in order to make use of digital documents more attractive. First, the compression should enable compressing and decompressing large amounts of information in a small amount of time. Secondly, the compression should provide for accurately reproducing the digital document.
One commonly used approach to encoding documents and images is to use a compression scheme or system that utilizes probabilities. For example, an arithmetic encoder, which is widely used, obtains good compression by using sophisticated models and using probabilities for the data to be encoded. Generally, the better or closer the probabilities are, the better the compression achieved. Arithmetic coding can code close to entropy, which is the average amount of information per symbol, given a probability distribution of the possible symbols. It is not possible to code better than entropy. Typically, coders such as arithmetic encoders, rely on a large, fixed context to generate their probabilities. However, the computation of these probabilities can be computationally expensive and time consuming. Furthermore, the fixed context used generally requires training using previously encoded data, as in classical adaptive arithmetic encoders, in order to yield a satisfactory context. Thus, if an image is encoded with quickly changing, noisy, or too complex a distribution for adaptation, poor compression is usually obtained. Thus, the conventional coders fail to adequately adapt to the image being encoded.