Data compression is the reversible encoding of data into a more compact expression which permits more efficient storage and transmission of the data. Some data compressing procedures result in the loss of some of the data upon uncompression; when the data is compressed and then uncompressed, it no longer matches the original data exactly. These so-called "lossy" compression schemes are commonly used for video image data or audio data, where some loss of information can be tolerated or perhaps not even noticed. For example, lossy compression is used extensively in video image compression for multimedia computing and also high-definition television. Although lossy compression loses some of the data, it is highly effective in achieving compression, with compression ratios of 20-to-1 and up to 200-to-1.
Other compression algorithms are "lossless" in that the data can be compressed and then uncompressed, and the resulting output matches the original input exactly. Lossless compression is the only acceptable procedure for compressing information or program data in which losses are not tolerable. Such information and program data constitute the great majority of information and program data outside of the field of video and audio applications. The demands for lossless compression usually result in much lower compression ratios as compared to lossy compression, with ratios of 1.5-to-1 and up to 10-to-1 being typical. The present invention is concerned with lossless data compression.
Most data compression systems used today rely on the fact that sets of data are typically repetitious in limited ways. For example, data such as ASCII files contain byte streams representing alphanumeric symbols, words and phrases that appear repeatedly in the text. By replacing the byte streams with abbreviated byte streams, the data can be compressed considerably. This is generally accomplished using a "dictionary" containing the abbreviations for the byte streams that are to be compressed. The value of using a fixed dictionary is diminished, however, as the size of the dictionary increases, because the savings achieved in compressing the data becomes offset by the expense of comparing the data to a large dictionary. Therefore, the ideal dictionary is limited in scope to contain only those byte streams that are seen often enough in the data to justify the comparison with the dictionary.
Early theoretical work on the use of dictionary compression encoding was done by Jacob Ziv and Abraham Lempel and is described in their article "On Complexity of Finite Sequences", IEEE Transactions on Information Theory, 22:1 (1976) 75-81. The Ziv and Lempel work developed the idea of "adaptive dictionaries" in which the content of the dictionary and hence the compression scheme for future encoding is based on the recent data output. Therefore, the dictionary is adaptive to the data itself. The many compression techniques based on the theoretical work of Ziv and Lempel are commonly referred to as Lempel-Ziv coding or LZ coding, and the essence of those techniques is that byte streams are replaced with pointers to where the byte streams occurred earlier.
An example of LZ coding is presented in an article by J. A. Storer entitled "Textual Substitution Techniques for Data Compression", Combinational Algorithms on Words (edited by A. Apostolico and Z. Galil) Springer-Verlag, Berlin, p. 120-121, 1985. In the Storer technique, an encoder and decoder are provided with a fixed amount of memory for use as a dictionary containing a number of dictionary entries. Each entry has a unique pointer associated with it. As data is received, the encoder finds the longest string of characters in the data that matches an entry in the dictionary, transmits the pointer associated with the matched entry in place of the string of characters, updates the dictionary, and if the dictionary is full then deletes one of the entries.
Other examples of LZ coding are found in U.S. Pat. Nos. 4,876,541 by Storer (and the references discussed and cited therein), 5,003,307 by Whiting et al., 4,847,619 by Kato et al., and 4,701,745 by Waterworth.
One of the principal variables in LZ coding which tends to distinguish the many variations of that technique from one another, is the way in which the dictionary is established and updated. Some techniques utilize a sliding window dictionary in which the incoming data are compared against the data in a window sliding over a fixed number N of previous bytes of data. Thus the dictionary in this approach is actually a sequential portion of the text itself. This is sometimes called LZ 77 or LZ 1 coding.
LZ 77 coding can give good data compression provided that the dictionary window is long enough, such as 8K bytes. Common words and fragments will often be matched wholly or partially by a sliding window dictionary of that length. With respect to less common words and fragments, the sliding window dictionary takes advantage of the fact that these less common words and fragments are often concentrated in a single portion of the text. Unusual words are often used repeatedly in a brief portion of text to discuss a single concept, rather than scattered through a long text.
The principle disadvantage to sliding window dictionaries as used in LZ 77 coding is not that they are ineffective in achieving good compression, for they are indeed effective in achieving good compression. The principal disadvantage to sliding window dictionaries is that they require a large and constant amount of time for performing the encoding because of the relatively large window length. The encoding speed can be increased by using various known data structures, but that also increases the required memory. LZ 77 coding is relatively fast at the decoding step, however, and so it is suitable for applications which require only one encoding on a fast computer with large amounts of memory and multiple decoding, as commonly is the case with on-line help files, and electronic books.
Other techniques in LZ coding replace the sliding window dictionary with a true adaptive dictionary which accumulates phrases found in the incoming data under the broad assumption that phrases that occur once are likely to occur again. By using known data structure and hashing techniques, the comparison of the incoming data to the entries in the adaptive dictionary can be done relatively quickly. Adaptive dictionaries also have an advantage over sliding window dictionaries in that the adaptive dictionaries may include entries generated much earlier than the current incoming data, while the entries in a sliding window dictionaries are limited to the span of the window. The main drawback to adaptive dictionaries is that once the limited adaptive dictionary is full, no further entries can be made without deleting some of the existing entries. While the dictionary can be made very large so that it accommodates a large number of entries before being filled, this increases memory requirements and comparison times.