The present invention relates generally to data communication systems and methods, and particularly to data compression systems and methods which improve the efficiency of data transmission.
The vast and constantly increasing amount of data that is transmitted over communication lines and networks is a major incentive for the development of improved systems and methods for data compression. The overall performance of communication systems can be significantly increased with data compression techniques, since such techniques enable more information to be transmitted with fewer data bits over communication channels.
The performance of data compression techniques is mainly determined by three major factors. The first factor is the amount of compression achieved, or the ratio of the number of starting data bits to the number of bits produced. The second factor is the speed of compression, or the time needed to produce these bits. The third factor is the amount of computational overhead, in particular the requirement for computer resources such as memory. Generally, the following relation holds among these factors: the more compression achieved, the slower is the process and the more overhead required; conversely, the faster the process, the lesser compression amount achieved.
Normally, a particular compression technique is chosen according to the characteristics of the application. For example, "off-line" applications, which are not performed in real time, typically give up speed and overhead to achieve better compression. On the other hand, "on-line" applications, and in particular communication applications, typically settle for lesser compression to gain more speed.
Communication applications, or programs which facilitate the transmission of data on a communication channel, have certain characteristics which should be considered when choosing a technique for compression. For example, communication systems typically divide the transmitted data into blocks or "packets". If compression is desired, each packet should be compressed before transmission by the selected compression technique. Since communication channels between computers, particularly networks employing telephone system connections, have limited capacity, greater compression of the data increases the total amount of information which can be transmitted on the available bandwidth. On the other hand, since data compression for communication systems is typically needed on-line, the need for greater compression must be balanced against the increased amount of time and resources required for the compression process as the amount of compression increases. These competing requirements can be balanced by the choice of the proper data compression technique.
Data compression techniques encode the original data into a representation of fewer data bits, according to a translation data dictionary referred to herein as the "encoding table". The encoding table is typically derived from the data according to a selected scheme relating to various statistical information gathered therefrom, such as the frequencies of certain patterns in the data. Normally, the length of the bit representation in the encoding table for encoded data patterns is inversely related to the frequency of occurrence of these patterns.
Hereinafter, the term "text" refers to a stream of data bits which is provided as a unit to the compression algorithm, and includes but is not limited to, word data from a document, image data and other types of data. As noted above, the text can have features or characteristics such as internal patterns of data. The text can be compressed according to a number of different types of compression algorithms.
Hereinafter, the term "static compression algorithm" refers to algorithms which do not affect, update or otherwise change the encoding table for a given unit of text. Hereinafter, the term "dynamic compression algorithm" refers to algorithms for which the encoding table is constantly updated or changed according to features or characteristics of the text by a selected scheme. Hereinafter, the term "semi static compression algorithm" refers to algorithms for which the encoding table is occasionally updated or changed according to the text by a selected scheme. Hereinafter, the term "adaptive compression algorithm" refers to a dynamic or semi-static algorithm in which the encoding table is either constantly or occasionally updated or changed according to data pattern variations encountered in the text.
The last class of algorithms, adaptive algorithms, has a number of advantages. For example, these algorithms permit the encoding table to be adjusted to best reflect the data patterns in the text which is a "learning" capability. Furthermore, the encoding table need not necessarily be transmitted along with the encoded data, but rather can be fully rebuilt at the receiving end from the encoded data during decompression. Thus, this class of techniques is particularly well suited for data compression in a communication system.
Examples of such adaptive data compression techniques include the well-known Lempel-Ziv algorithms known, respectively, as LZ77 and LZ78, for constructing the encoding table (Ziv J., Lempel A.: A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, Vol IT-23, (1977) pp. 337-343; Ziv J., Lempel A.: Compression of individual sequences via variable rate coding, IEEE Transactions on Information Theory, Vol IT-24, (1978) pp. 530-536). Waterworth (Waterworth J. R.: Data compression system, U.S. Pat. No. 4,701,745, Oct. 20, 1987) and Whiting et al. (Whiting D. L., George G. A., Ivey G. E.: Data compression apparatus and method, U.S. Pat. No. 5,016,009, May 14, 1991; Whiting D. L., George G. A., Ivey G. E.: Data compression apparatus and method, U.S. Pat. No. 5,126,739, Jun. 30, 1992) provide efficient implementations of the Lempel & Ziv LZ77 technique for identifying data patterns in the text. A similar fast implementation is given by Williams (Williams R. N., An extremely fast Ziv-Lempel data compression algorithm, Proceedings Data Compression Conference DCC'91, Snowbird, Utah, Apr. 8-11, 1991, IEEE Computer Society Press, Los Alamitos, Calif., pp. 362-371). In addition, Huffman (Huffman D.: A method for the construction of minimum redundancy codes, Proceedings IRE, Vol 40, (1952) pp. 1098-1101) provides an optimal encoding scheme. Finally, Brent (Brent R. P.: A linear algorithm for data compression, The Australian Computer Journal, Vol 19, (1987) pp. 64-68) provides a static technique that takes advantage of both LZ77 and the Huffman encoding scheme.
Although these well-known data compression techniques have been successfully employed, they have a number of disadvantages for communication systems. For example, the implementations of Whiting do not use statistical information from previous data packets to more efficiently compress current packets. Furthermore, the static technique of Brent requires the encoding table to be transmitted with the encoded data, thereby consuming valuable bandwidth. Some other methods of compression do not take advantage of the basic structure of data transmissions in communication systems, in which data are transmitted in packets rather than as a continuous stream. Thus, many of the currently available data compression techniques have significant disadvantages, particularly with regard to communication systems.
By contrast, the present invention provides significant improvements to the prior art in general, and for implementations relating to data communication in particular. First, in contrast to dynamic techniques which constantly update the encoding table, the semi-static technique provided by the present invention only occasionally updates the encoding table, thereby significantly improving the encoding speed. Second, the present invention features an improved implementation of the Huffman encoding, thereby gaining a significant increase of speed in exchange for slight or negligible degradation of the compression capacity. Third, the present invention features an improved encoding scheme which provides for achieving better compression.
The present invention therefore satisfies an unmet need for a method for data compression which is particularly suited for communication systems, which is adaptive to the characteristics of the transmitted data, which is rapid and yet which is able to significantly compress the data in order to maximize bandwidth of the communication system.