The amount of information available via computers has dramatically increased with the wide spread proliferation of computer networks, the Internet and digital storage means. With the increased amount of information has come the need to transmit information quickly and to efficiently store the information. Data compression is one manner in which document(s) can more effectively be transmitted and/or stored.
Conventional data compression systems have utilized various compression approaches, for example, symbol matching. However, typical compression approaches that work effectively for documents having image(s) do not work well, for example, for documents have text and/or handwriting.
Data compression reduces the space necessary to represent information. Compression can be used for any type of information. However, compression of digital information, including images, text, audio, and video is becoming more important. Typically, data compression is used with standard computer systems. However, other technologies make use of data compression, such as but not limited to digital and satellite television as well as cellular/digital phones.
Data compression is important for several reasons. Data compression allows information to be stored in less space than uncompressed data. As the demand for large amounts of information increases, data compression may be required to supply the large amounts of information. The size of storage devices has increased significantly, however the demand for information has outstripped these size increases. For example, an uncompressed image can take up 5 megabytes of space whereas the same image can be compressed and take up only 2.5 megabytes of space. Additionally, data compression permits transferring of larger amounts of compressed information than uncompressed information. Even with the increase of transmission rates, such as broadband, DSL, cable modem Internet and the like, transmission limits are easily reached with uncompressed information. For example, transmission of an uncompressed image over a DSL line can take ten minutes. However, with data compression, the same image can be transmitted in about a minute.
In general, there are two types of compression, lossless and lossy. Lossless compression allows the exact original data to be recovered after compression, while lossy compression allows the original data to differ from the uncompressed data. Lossy compression allows for a better compression ratio because it can eliminate data from the original. Lossless compression may be used, for example, when compressing critical text, because failure to exactly reconstruct the data can seriously affect the quality and readability of text. Lossy compression can be used with images or non-critical text where a certain amount of distortion or noise is either acceptable or imperceptible by our limited senses.
Data compression is especially applicable to digital documents. Digital documents or digital document images are digital representations of documents. Typically, digital documents include text, images and/or text and images. In addition to using less storage space for current digital data, compact storage without significant degradation of quality would encourage the digitization of current hardcopies making paperless offices more feasible. Striving toward such paperless offices is an important goal for business to have, because paperless offices provide many benefits, such as allowing easy access to information, reducing environmental costs, reducing storage costs and the like. Furthermore, decreasing file sizes of digital documents through compression allows more efficient use of Internet bandwidth, thus allowing for faster transmission of more information and a reduction of network congestion. Reducing required storage for information, movement toward efficient paperless offices, and increasing Internet bandwidth efficiency are just some of the many significant benefits of compression technology.
Data compression of digital documents has a number of goals to make the use of digital documents more attractive. First, data compression should be able to compress and decompress large amounts of information in a small amount of time. Secondly, data compression should be able to accurately reproduce the digital document.
Additionally, data compression of digital documents should make use of the purpose of a document. Some digital documents are used for filing or providing hard copies. Other documents may be revised and/or edited. Current data compression fails to handle reflowing of text and/or images when viewed, and fails to provide efficient and effective means to enable compression technology to recognized characters and reflow them to word processors, personal digital assistants (PDAs), cellular phones, and the like. Therefore, if hard copy office documents are scanned into digital form, current compression technology can make it difficult if not impossible to update, amend, or in general change the digitized document.