A variety of data compression algorithms derive from work published in Ziv, Jacob and Lempel, Abraham, "A Universal Algorithm for Sequential Data Compression,"IEEE Transactions on Information Theory 23(3) :337-343, May 1977. These algorithms are commonly referred to as LZ77compression schemes. LZ77compression schemes are based on the principle that repeated strings of characters can be replaced by a pointer to the earlier occurrence of the string. A pointer is typically represented by an indication of the position of the earlier occurrence (typically an offset from the start of the repeated string) and the number of characters that match (the length). The pointers are typically represented as &lt;offset, length&gt; pairs. For example, the following string
"abcdabcdacdacdacdaeaaaaaa" PA1 "abcd&lt;4,5&gt;&lt;3,9&gt;ea&lt;1,5 &gt;"
may be represented in compressed form by the following
Since the first characters "abcd" do not match any previous character, they are output in uncompressed form as literals. The pair &lt;4,5&gt;indicates that the string starting at an offset of 4 and extending for 5 characters is repeated "abcda". The pair &lt;3,9&gt; indicates that the string starting at an offset of 3 and extending for 9 characters is repeated. The "ea" are output as literals and the pair &lt;1,5&gt; indicate a string starting at an offset of 1 and extending for 5 characters ("aaaaa").
Compression is achieved by representing the repeated strings as a pointer with fewer bits than it would take to repeat the string. Typically, an unmatched single byte, known as a literal, is not represented as a pointer. Rather, literals are output with a flag indicating literal encoding followed by the byte itself. A pointer is typically differentiated from a literal encoding by a different flag that is followed by the offset and length. The offset and length can be encoded in a variety of ways. By adding a flag to each literal and each pointer, the prior art LZ-based systems provide less than optimal compression and may even cause expansion.
The great majority of prior art compression systems are directed to compression of data stored in a single medium, such as a floppy disk or computer hard drive. However, very little work has been in the area of lossless generic data compression over an unreliable link, such as in a network or modem communication. In such instances, the compressor and decompressor are physically detached and any information the compressor sends the decompressor might be lost along the way.
Modern networks operate in layers. At one of the higher layers of a network, a redirector or requestor may transmit blocks of homogeneous data, such as a database or a file. The data blocks typically are sent to a transport layer that breaks the data blocks into packets of limited size, such as 1.5 kilobytes(K) or less. The transport then adds transport headers to the packets and sends the packets to a Media Access Controller (MAC) that adds frame headers to the packets to form frames that are transmitted across the network. Although the packets sent from the transport often are consecutive portions of a single data block, the transport might send packets from two or more blocks stored in different buffer if there are two different sends simultaneously outstanding.
Given the homogeneity of the data at the transport layer, optimum compression may be obtained by enabling the transport to compress the data. However, transports are very complicated and need to interface with existing operating systems, such as DOS, Windows, OS/2, and UNIX so the task of creating a new transport with compression is very large. Further, existing transports are already widely used, so creating a new transport would not automatically provide compression to the many users of existing transports.
Some developers have attempted to employ compression at the MAC level. Some efforts are limited to compressing the transport header for each packet. Others compress the data in each packet, but compress each packet separately without using the data in previous frames. No known prior art system compresses frames based on the data in previous frames over an unreliable link. Further, no known prior art system employs LZ77-based compression to compress frames being sent across an unreliable communication link.