With the continued growth in demand for data transmission and data storage capacities, improved lossless data compression techniques are continually sought. As described in coassigned U.S. Pat. No. 5,652,878, of the many classes of lossless data compression, one of the most useful is the class of dictionary based compression techniques. Among these, the most useful today are the so-called Ziv-Lempel variable-length encoding procedures ascribed to J. Ziv and A. Lempel who suggested the "LZ1" length offset encoding scheme. The LZ1 process uses a fixed size sliding "history" window into the past source data string as the dictionary. Matches are encoded as a "match length" and an "offset" from an agreed position.
Because LZ1 scrolls the source string over a fixed sized sliding history window to create an adaptive dictionary, identification of duplicate "matching" strings in the source data is at first difficult, but becomes very efficient. Once a matching string is encoded as a "length" and "offset", the necessary decoding process is rapid and efficient, requiring no dictionary preload. The '878 patent illustrates an LZ1 compression technique which has been denominated the "adaptive lossless data compression" technique, or "ALDC".
All sliding window data compression processes suffer from what may be called "start-up losses" and "non-redundancy losses" in compression efficiency and corresponding increases in entropy. Because each source string or block begins with an empty "dictionary", the first source symbol must be passed through as a raw word without compression, and begins building the dictionary. Similarly, a string of input data which has already been encrypted or compressed and lacks substantial redundancy, will likely lack the matches required to achieve compression, and these source symbols must also be passed through as raw words without compression. The raw words must be identified as such, however, thereby leading to an expansion of the data.
Only after accumulating a substantial dictionary, by having the sliding window fill up with input data having substantial redundancy, are matches found for increasing numbers of substrings which allow encoding efficiency to build up.
In the original LZ1 arrangement, called "LZ77", all source input is output in the form of a three part token having the length and offset together with a flag, which is the first character of the compressed substring. Techniques such as ALDC overcome the problem when a non-redundant character is encountered by not sending the three part token, but instead providing the character unchanged, called a "literal", and providing it with a designation to indicate that it is not compressed. A typical designation is an added "zero" bit for each word of the source string. Thus, when encountering a string of non-redundant input data, the compression is expanded by a much smaller length than is likely with the original LZ1 technique. However, LZ1 techniques such as ALDC still must actually expand the data by one bit for every word, typically a 9/8 expansion to output them as literals.
Because of this problem, alternative compression techniques have been designed to offer special advantages in particular circumstances. An example is LZ2 compression (also known as LZ78 or the related version known as LZW) which captures redundancies and maintains them in a dictionary for, e.g., an entire record, as described in "The Data Compression Book", M. Nelson, M & T Publishing, 1991, pp. 277-311. Thus, the opportunity for having redundancies is expanded, albeit at the cost of an expanded dictionary buffer.