The present invention provides a system and method for increasing the speed of data transmission by compressing the data before transmission and decompressing it after it is received. The present invention "preserves information"--i.e., the receiver gets exactly what was transmitted, and thus the compression does not cause information to be lost.
Standard data representations such as ASCII and EBCDIC are designed for flexibility rather than efficiency. It is well known that for any specific application, a more efficient encoding of data is possible.
The basis of data compression is the fact that for any specific purpose, general-purpose codes provide a coding which is more lengthy than needed. Simple viewed on a character basis, standard English proses uses 26 lower case symbols, 26 upper case symbols, 10 numeric digits, and perhaps 18 punctuation symbols--a total of 80 symbols. ASCII and EBCDIC both use eight bits per character to provide 256 symbol codes. Furthermore, from a frequency of use viewpoint, 20 lower case, 8 upper case, and 4 punctuation symbols comprise over 90 percent of typical usage.
The variable-bit-length Huffman code technique, described in some detail below, provides a specific mechanism to exploit this variability in frequency of use. For standard English prose, this provides an average code size of slightly more than five bits per character (this estimate is based on the frequency tables in "Cypher Systems", by Beker and Piper, 1982).
The prior art includes a variety of data compression techniques. For instance, see
Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the I.R.E., p.1098 (September 1952) PA1 U.S. Pat. No. 3,237,170 (Blasbalg et al.), PA1 U.S. Pat. No. 3,694,813 (Loh et al.), and PA1 U.S. Pat. No. 4,494,108 (Langdon, Jr. et al.)
The basic concept behind some data compression schemes, including the present invention, is that data should be encoded using codes that have a bit length inversely proportional to the frequency of the characters or character combinations in the data stream.
The present invention overcomes significant shortcomings in the prior art by providing the following features. First, the prior art does not provide an efficient method of adapting the data compression technique used when the data patterns being encoded change. Most existing schemes for compressing data require special handling by the user and thus require the user to sacrifice both flexibility and ease of use to achieve efficiency. In contrast, the present invention automatically adapts to different data patterns by providing not only a plurality of encoding tables and means for switching from one to another, but also a technique for building new and identical tables in both the encoding and decoding sides of a communication channel without having to transmit the table from the encoder to the decoder.
Second, the prior art does not provide an efficient method of packaging encoded data so that the receiver gets the transmitted data as soon as possible. The present invention provides means for varying the size of the data packets transmitted so that the transmitted data gets to the receiving computer without being significantly delayed by the data compression system.
Third, the present invention, unlike the prior art, uses a string substitution technique in combination with adaptive data compression--a combination which can substantially improve data transmission rates.
Fourth, the present invention, unlike the prior art data, provides protocol emulation features systems which are essential to taking full advantage of the increased data transmission speeds allowed by the data compression.
It is therefore a primary object of the present invention to provide an improved adaptive data compression system and method. The data compression technique of the present invention significantly improves data transmission speed through the use of the features noted above and other features described below.