The amount of information available via computers has dramatically increased with the wide spread proliferation of computer networks, the Internet and digital storage means. With such an increased amount of information has come the need to transmit information quickly and to store the information efficiently. Data compression is a technology that facilitates effectively transmitting and storing of information
Data compression reduces an amount of space necessary to represent information, and can be used for many information types. The demand for compression of digital information, including images, text, audio and video has been ever increasing. Typically, data compression is used with standard computer systems; however, other technologies make use of data compression, such as but not limited to digital and satellite television as well as cellular/digital phones.
As the demand for handling, transmitting and processing large amounts of information increases, the demand for compression of such data increases as well. Although storage device capacity has increased significantly, the demand for information has outpaced capacity advancements. For example, an uncompressed image can require 5 megabytes of space whereas the same image can be compressed and require, for example, only 2.5 megabytes of space for lossless compression or 500 kilobytes of space for lossy compression. Thus, data compression facilitates transferring larger amounts of information. Even with the increase of transmission rates, such as broadband, DSL, cable modem Internet and the like, transmission limits are easily reached with uncompressed information. For example, transmission of an uncompressed image over a DSL line can take ten minutes. However, the same image can be transmitted in about one minute when compressed thus providing a ten-fold gain in data throughput.
In general, there are two types of compression, lossless and lossy. Lossless compression allows exact original data to be recovered after compression, while lossy compression allows for data recovered after compression to differ from the original data. A tradeoff exists between the two compression modes in that lossy compression provides for a better compression ratio than lossless compression because some degree of data integrity compromise is tolerated. Lossless compression may be used, for example, when compressing critical text, because failure to reconstruct exactly the data can dramatically affect quality and readability of the text. Lossy compression can be used with images or non-critical text where a certain amount of distortion or noise is either acceptable or imperceptible to human senses. Data compression is especially applicable to digital representations of documents (digital documents). Typically, digital documents include text, images and/or text and images. In addition to using less storage space for current digital data, compact storage without significant degradation of quality would encourage digitization of current hardcopies of documents making paperless offices more feasible. Striving toward such paperless offices is a goal for many businesses because paperless offices provide benefits, such as allowing easy access to information, reducing environmental costs, reducing storage costs and the like. Furthermore, decreasing file sizes of digital documents through compression permits more efficient use of Internet bandwidth, thus allowing for faster transmission of more information and a reduction of network congestion. Reducing required storage for information, movement toward efficient paperless offices, and increasing Internet bandwidth efficiency are just some of many significant benefits associated with compression technology.
Compression of digital documents should satisfy certain goals in order to make use of digital documents more attractive. First, the compression should enable compressing and decompressing large amounts of information in a small amount of time. Secondly, the compression should provide for accurately reproducing the digital document. Additionally, data compression of digital documents should make use of an intended purpose or ultimate use of a document. Some digital documents are employed for filing or providing hard copies. Other documents may be revised and/or edited. Many conventional data compression methodologies fail to handle re-flowing of text and/or images when viewed, and fail to provide efficient and effective means to enable compression technology to recognized characters and re-flow them to word processors, personal digital assistants (PDAs), cellular phones, and the like. Therefore, if hard copy office documents are scanned into digital form, current compression technology can make it difficult if not impossible to update, amend, or in general change the digitized document.
Often, compression schemes are tailored to a particular type of document, such as binary, non-binary, textual or image, in order to increase compression. However, a compression scheme tailored for one type of document does not typically perform well for other types of documents. For example, a compression scheme tailored for textual based documents does not generally perform well with an image document. One solution to this problem is to select a compression scheme tailored to the type of document or image to be encoded. However, this solution can fail for digital documents which have more than one type of information in a single document. For example, a digital document can have a hi-color image along with textual information, such as is commonly seen in magazine articles. One approach to overcome this failing is to analyze a document and divide it into various regions. The various regions can be analyzed to determine the type of information contained within the reasons. A compression scheme can be selected for each region based on the type of information. However, this approach can be quite difficult to implement and requires regions of a variety of sizes and shapes which cause difficulties for compression. Another approach is to separate a document into a background and a constant color image. This can be helpful because a different compression scheme can be used for the background and the constant color image. However, the constant color image can cause information to be lost by forcing pixel values to be a constant color.
Additionally, data compression of digital documents should make use of the purpose of a document. Some digital documents are used for filing or providing hard copies. Other documents may be revised and/or edited. Current data compression fails to handle re-flowing of text and/or images when viewed, and fails to provide efficient and effective means to enable compression technology to recognized characters and re-flow them to word processors, personal digital assistants (PDAs), cellular phones, and the like. Therefore, if hard copy office documents are scanned into digital form, current compression technology can make it difficult if not impossible to update, amend, or in general change the digitized document.