1. Field of the Invention
The invention relates to the field of data compression and decompression (recovery) of the compressed data.
2. Description of the Prior Art
Data compression systems are known in the prior art that encode a stream of digital data signals into compressed digital data signals and decode the compressed digital data signals back into the original data signals. Data compression refers to any process that converts data in a given format into an alternative format having fewer bits than the original. The objective of data compression systems is to effect a savings in the amount of storage required to hold or the amount of time required to transmit a given body of digital information. The compression ratio is defined as the ratio of the length of the encoded output data to the length of the original input data. The smaller the compression ratio, the greater will be the savings in storage or time. By decreasing the required memory for data storage or the required time for data transmission, compression results in a monetary savings. If physical devices such as magnetic disks or magnetic tape are utilized to store the data files, then a smaller space is required on the device for storing the compressed data thereby utilizing fewer disks or tapes. If telephone lines or satellite links are utilized for transitting digital information, then lower costs result when the data is compressed before transmission. Data compression devices are particularly effective if the original data contains redundancy such as having symbols or strings of symbols appear with high frequency. A data compression device transforms an input block of data into a more concise form and thereafter translates or decompresses the concise form back into the original data in its original format.
For example, it may be desired to transmit the contents of a daily newspaper via satellite link to a remote location for printing thereat. Appropriate sensors may convert the contents of the newspaper into a data stream of serially occurring characters for transmission via the communication link. If the millions of symbols comprising the contents of the newspaper were compressed before transmission and reconstituted at the receiver, a significant amount of transmission time would be saved.
As a further example, when an extensive data base such as an airline reservation data base or a banking system data base is stored for archival purposes, a significant amount of storage space would be saved if the totality of characters comprising the data base were compressed prior to storage and reexpanded from the stored compressed files for later use.
To be of practical and general utility, a digital data compression system should satisfy certain criteria. The system should provide high performance with respect to the data rates provided by and accepted by the equipment with which the data compression and decompression systems are interfacing. The rate at which data can be compressed is determined by the input data processing rate into the compression system, typically in millions of bytes per second (megabytes/sec). High performance is necessary to maintain the data rates achieved in present day disk, tape and communication systems which rates typically exceed one megabyte/sec. Thus, the data compression and decompression systems must have data bandwidths matching the bandwidths achieved in modern devices. The performance of prior art data compression and decompression systems is typically limited by the speed of the random access memories (RAM), and the like, utilized to store statistical data and guide the compression and decompression processes. High performance for a compression device is characterized by the number of ram cycles (read and write operations) required per input character into the compressor. The fewer the number of memory cycles, the higher the performance. A high performance design can be utilized with economical, slow RAMS for low speed applications such as telephone communications, or with very fast RAMS for magnetic disk transfers.
Another important criterion in the design of a data compression and decompression system is compression effectiveness. Compression effectiveness is characterized by the compression ratio of the system. The compression ratio is the ratio of data storage size in compressed form divided by the size in uncompressed form. In order for data to be compressible, the data must contain redundancy. Compression effectiveness is determined by how effectively the compression procedure matches the forms of redundancy in the input data. In typical computer stored data, e.g. arrays of integers, text or programs and the like, redundancy occurs both in the nonuniform usage of individual symbology, e.g. digits, bytes, or characters, and in frequent recurrence of symbol sequences, such as common words, blank record fields, and the like. An effective data compression system should respond to both types of redundancy.
A further criterion important in the design of data compression and decompression systems is that of adaptability. Many prior art data compression procedures require prior knowledge, or the statistics, of the data being compressed. Some prior art procedures adapt to the statistics of the data as it is received. Adaptability in the prior art processes has required an inordinate degree of complexity. An adaptive compression and decompression system may be utilized over a wide range of information types, which is typically the requirement in general purpose computer facilities. It is desirable that the compression system achieves good compression ratios without prior knowledge of the data statistics. Data compression and decompression procedures currently available are generally not adaptable and so cannot be utilized for general purpose usage.
Another important criteria in the design of data compression and decompression systems is that of reversibility. In order for a data compression system to possess the property of reversibility, it must be possible to reexpand or decompress the compressed data back into its original form without any alteration or loss of information. The decompressed and the original data must be identical and indistinguishable with respect to each other.
General purpose data compression procedures are known in the prior art that either are or may be rendered adaptive, two relevent procedures being the Huffman method and the Tunstall method. The Huffman method is widely known and used, reference thereto being had in an article by D. A. Huffman entitled "A Method for the Construction of Minimum Redundancy Codes", Proceedings IRE, 40, 10, pages 1098-1100 (September, 1952). Further reference to the Huffman procedure may be had in an article by R. Gallagher entitled "Variations on a Theme by Huffman, IEEE Information Theory Transactions, IT-24, No. 6 (November, 1978). Adaptive Huffman coding maps fixed length sequences of symbols into variable length binary words. Adaptive Huffman coding suffers from the limitation that it is not efficacious when redundancy exists in input symbol sequences which are longer than the fixed sequence length the procedure can interpret. In practical implementations of the Huffman procedure, the input sequence lengths rarely exceed 12 bits due to RAM costs and, therefore, the procedure generally does not achieve good compression ratios. Additionally, the adaptive Huffman procedure is complex and often requires an inordinately large number of memory cycles for each input symbol. Thus, the adaptive Huffman procedure tends to be undesirably cumbersome, costly, and slow thereby rendering the process unsuitable for most practical present day installations.
Reference to the Tunstall procedure may be had in the doctoral thesis of B. T. Tunstall, entitled "Synthesis of Noiseless Compression Codes", Georgia Institute of Technology, (September, 1967). The Tunstall procedure maps variable length input system sequences into fixed length binary output words. Although no adaptive version of the Tunstall procedure is described in the prior art, an adaptive version could be derived which, however, would be complex and unsuitable for high performance implementations. Neither the Huffman nor the Tunstall procedure has the ability to encode increasingly longer combinations of source symbols.
A further adaptive data compression and decompression system that overcomes many of the disadvantages of the prior art is that disclosed in co-pending U.S. patent application Ser. No. 291,870 filed Aug. 10, 1981 now U.S. Pat. No. 4,464,650 entitled "Apparatus and Method for Compressing Data and Restoring the Compressed Data" by M. Cohen, W. Eastman, A. Lempel and J. Ziv. The procedure of said Ser. No. 291,870 parses the stream of input data symbols into adaptively growing sequences of symbols. The procedure of said Ser. No. 291,870 suffers from the disadvantages of requiring numerous RAM cycles per input character and utilizing time consuming and complex mathematical procedures such as multiplication and division to effect compression and decompression. These disadvantages tend to render the procedure of said Ser. No. 291,870 unsuitable for numerous economical high performance implementations.
It is appreciated from the foregoing that neither the prior art nor the procedure of said Ser. No. 291,870 provides an adaptive, efficient, compression and decompression system suitable for high performance applications. No known prior design approach is directly suitable for such a device.