The present invention relates generally to data compression and, more particularly, to improving the compression of data in packet networks.
Conventional data compression techniques and systems encode a stream of digital data into a compressed code stream and decode the compressed code stream back into a corresponding original data stream. The code stream is referred to as xe2x80x9ccompressedxe2x80x9d because the stream typically consists of a smaller number of codes than symbols contained in the original data stream. Such smaller codes can be advantageously stored in a corresponding smaller amount of memory than the original data. Further, the compressed code stream can be transmitted in a communications system, e.g., a wired, wireless, or optical fiber communications system, in a corresponding shorter period of time than the uncompressed original data. The demand for data transmission and storage capacity in today""s communications networks is ever-increasing. Thus, data compression plays an integral role in most modem transmission protocols and communications networks.
As is well-known, two classes of compression techniques useful in the compression of data are so-called special purpose compression and general purpose compression. Special purpose compression techniques are designed for compressing special types of data and are often relatively inexpensive to implement. For example, well-known special purpose compression techniques include run-length encoding, zero-suppression encoding, null-compression encoding, and pattern substitution. These techniques generally have relatively small compression ratios due to the fact that they compress data which typically possesses common characteristics and redundancies. As will be appreciated, a compression ratio is the measure of the length of the compressed codes relative to the length of the original data. However, special purpose compression techniques tend to be ineffective at compressing data of a more general nature, i.e., data that does not possess a high degree of common characteristics and the like.
In contrast, general purpose compression techniques are not designed for specifically compressing one type of data and are often adapted to different types of data during the actual compression process. Some of the most well-known and useful general purpose compression techniques emanate from a family of algorithms developed by, J. Ziv and A. Lempel, and commonly referred to in the art as xe2x80x9cLempel-Ziv codingxe2x80x9d. In particular, Ziv et al., xe2x80x9cA Universal Algorithm for Sequential Data Compressionxe2x80x9d, IEEE Transactions on Information Theory, IT-23(3):337-343, May 1977 (describing the commonly denominated xe2x80x9cLZ1xe2x80x9d algorithm), and Ziv et al., xe2x80x9cCompression of Individual Sequences Via Variable-Rate Codingxe2x80x9d, IEEE Transactions on Information Technology, IT-24(5):530-536, September 1978 (describing the commonly denominated xe2x80x9cLZ2xe2x80x9d algorithm), which are each hereby incorporated by reference for all purposes. The LZ1 and LZ2 data compression schemes are well-known in the art and need not be discussed in great detail herein.
In brief, the LZ1 (also referred to and known in the art as xe2x80x9cLZ77xe2x80x9d) data compression process is based on the principle that a repeated sequence of characters can be replaced by a reference to an earlier occurrence of the sequence, i.e., matching sequences. The reference, e.g., a pointer, typically includes an indication of the position of the earlier occurrence, e.g., expressed as a byte offset from the start of the repeated sequence, and the number of characters, i.e., the matched length, that are repeated. Typically, the references are represented as xe2x80x9c less than offset, length greater than xe2x80x9d pairs in accordance with conventional LZ1 coding. In contrast, LZ2 (also referred to and known in the art as xe2x80x9cLZ78xe2x80x9d) compression parses a stream of input data characters into coded values based on an adaptively growing look-up table or dictionary that is produced during the compression. That is, LZ2 does not find matches on any byte boundary and with any length as in LZ1 coding, but instead when a dictionary word is matched by a source string, a new word is added to the dictionary which consists of the matched word plus the following source string byte. In accordance with LZ2 coding, matches are coded as pointers or indexes to the words in the dictionary.
As mentioned above, the art is replete with compression schemes derived on the basic principles embodied by the LZ1 and LZ2 algorithms. For example, Terry A. Welch (see, T. A. Welch, xe2x80x9cA Technique for High Performance Data Compressionxe2x80x9d, IEEE Computer, pp. 8-19, June 1984, and U.S. Pat. No. 4,558,302, issued to Welch on Dec. 10, 1985, each of which is incorporated by reference for all purposes) later refined the LZ2 coding process to the well-known xe2x80x9cLempel-Ziv-Welchxe2x80x9d (xe2x80x9cLZWxe2x80x9d) compression process. Both the LZ2 and LZW compression techniques are based on the generation and use of a so-called string table that maps strings of input characters into fixed-length codes. More particularly, these compression techniques compress a stream of data characters into a compressed stream of codes by serially searching the character stream and generating codes based on sequences of encountered symbols that match corresponding longest possible strings previously stored in the table, i.e., dictionary. As each match is made and a code symbol is generated, the process also stores a new string entry in the dictionary that comprises the matched sequence in the data stream plus the next character symbol encounter in the data stream.
As will be appreciated and as detailed above, the essence of Lempel-Ziv coding is finding strings and substrings which are repeated in the original data stream, e.g., in a document to be transmitted. The repeated phrases in the document under compression are replaced with a pointer to a place where they have occurred earlier in the original data stream, e.g., document. As such, decoding data, e.g., text, which is compressed in this manner simply requires replacing the pointers with the already decoded text to which it points. As is well-known, one primary design consideration in employing Lempel-Ziv coding is determining whether to set a limit on how far back a pointer can reach, and what that limit should be. A further design consideration of Lempel-Ziv coding involves which substrings within the desired limit may be a target of a pointer. That is, the reach of a pointer into earlier text may be unrestricted, i.e., a so-called growing window, or may be restricted to a fixed size window of the previous xe2x80x9cNxe2x80x9d characters, where N is typically in the range of several thousand characters, e.g., 3 kilobytes. In accordance with this coding repetitions of strings are discovered and compressed only if they both appear in the window. As will be appreciated, the considerations made regarding such Lempel-Ziv coding design choices represent a compromise between speed, memory requirements, and compression ratio.
Compression is a significant consideration in improving network efficiencies. For example, when the available computational resources, i.e., the data processing requirements, are large compared to the available network bandwidth, it is most advantageous to compress data packets before transmission across the network. Of course, the actual compression scheme must be carefully selected in terms of speed and overall compression. That is, a compression scheme which is too slow will reduce network performance and an inefficient compression scheme will limit any potential transmission gains.
Further complicating the network efficiency issue is the fact that many packet networks are inherently unreliable. That is, current well-known packet networks, e.g., the Internet, routinely drop packets or reorder packets transmitted through the network thereby causing data transmission errors. For example, if the compression scheme introduces certain dependencies between packets, and the network thereafter drops or reorders such packets, the receiver may not be able to decompress a particular packet if a prior packet is lost due to the interdependencies amongst packets. As such, certain well-known approaches are employed to mitigate such problems: (1) Improve network reliability whereby, in terms of the Internet, a more reliable end-to-end transport layer service can be applied, e.g., the well-known Transmission Control Protocol (xe2x80x9cTCPxe2x80x9d), to compress packets at the transport level; (2) Stateless compression can be used wherein each packet is compressed independently thereby ensuring that each packet can be decompressed at the receiver; and (3) Streaming compression assumes reliable delivery and employs a reset mechanism when this assumption is violated. More particularly, when a packet is lost, the receiver discards each subsequent packet until compression is reset. After the reset, future packets are not dependent on prior packets and decompression can resume normally. Two well-known streaming-type compression techniques include the Point-to-Point Protocol""s (xe2x80x9cPPPxe2x80x9d) Compression Control Protocol, and the IP Header Compression protocol employed for Use Datagram Protocol (xe2x80x9cUDPxe2x80x9d) packets.
The above-described packet compression schemes are useful in mitigating the problems arising from packet interdependencies, however, such schemes present certain other complications. For example, compressing packets at the transport level requires end-to-end utilization, and typically requires a certain level of cooperation by the application during transmission. Similarly, while stateless compression provides a degree of robustness, the packet independence attribute of stateless compression reduces the realized compression ratio due to the fact that such compression examines the data in a single packet. Thus, for example, this compression approach cannot remove the large amount of redundancy typically found in network headers of adjacent packets. Further, while streaming compression provides greater compression ratios, these compression schemes multiply the effect of packet loss in that when one packet is lost in the network this causes the receiver to lose several other packets. For low reliability networks, e.g., the Internet, this multiplying packet effect reduces the utility of employing streaming compression.
Therefore, a need exists for a compression technique which provides greater robustness and increased compression ratios without the deleterious effects of prior compression schemes.
An aspect of the invention is directed to a communications method and apparatus that enables inter-packet compression thereby achieving greater robustness and increased compression ratios without the deleterious effects, e.g., the effect of packet loss multiplying, of prior compression schemes. In accordance with an aspect of the invention, a select history state is employed which is determined as a function of a so-called acknowledgement vector. In accordance with an aspect of the invention, the acknowledgement vector contains information with respect to the identification of packets which have been successfully received in a prior transmission over a communications channel. That is, in accordance with an aspect of the invention, the packet history state is a select history state associated with a respective packet. As such, a first side of the communications channel e.g., the transmitter or sender side, is furnished and cognizant of certain information about which packets have been successfully received by the second side of the communications channel, e.g., the receiver or recipient side. In turn, the decompressor is also furnished and cognizant of the select history to allow for efficient decompression of the transmitted compressed packets from the sender. That is, decompression occurs as a function of which packets were used as history, i.e., the select history state, during compression of such packets. As such, through select history state and acknowledgement aspects of the invention, the compressor and decompressor (at either side of the communications channel) work cooperatively to achieve improved compression across a communications channel.
In accordance with the preferred embodiment of the invention, the packets are encoded and prefixed with a header that includes at least a history vector, such history state identifying the respective history state associated with a packet. Further, in accordance with an aspect of the invention, the acknowledgment vector is constructed and communicated to the transmitter, whereby the specific compression algorithm at the transmitter, i.e., sender, can limit the history used by the compression algorithm to those packets that are successfully received. Thus, in accordance with the preferred embodiment, the vector identifying the packets used as the history is included in the compressed packet thereby enabling the receiver to reconstruct the packet history state necessary to decompress the packet.
Advantageously, in accordance with an aspect of the invention, increased robustness and greater compression ratios are achieved with a wide variety of compression methods or communications channel arrangements. That is, the principles of the invention are independent of any particular one compression scheme and, therefore, the advantages of employing the various aspects of invention are realized with a wide variety of compression methodologies and communications channel configurations.