1. Field of the Invention
This invention relates generally to adaptive dictionary-based data compression systems and specifically to an efficient Ziv-Lempel LZ1 coding procedure employing variable offset and control code fields suitable for data compression in hardware or software.
2. Description of the Related Art
With the explosive growth of demand for data transmission and storage capacity, improved data compression techniques are vigorously sought in the data processing arts. Although many different classes of data compression techniques are known in the art, one of the most useful is the class of dictionary-based universal compression techniques. Among these, the most useful today are the so-called Ziv-Lempel variable-length encoding procedures that are ascribed to J. Ziv and A. Lempel, who suggested the "length-offset" encoding scheme commonly denominated the "LZ1" data compression process in Ziv et al., "A Universal Algorithm for Sequential Data Compression", IEEE Trans. on Info. Theory, IT-23(3): 337-343, 1977). Later, Ziv et al. ("Compression of Individual Sequences via Variable Rate Coding", IEEE Trans. on Info. Theory, IT-24(5): 530-536, 1978) suggested the more popular adaptive "dictionary tree" encoding procedure commonly denominated the "LZ2" data compression process. The LZ1 process uses a fixed-size window into the past source data string as the dictionary. Matches are coded as a "match length" and an "offset" from an agreed position. LZ2 does not find matches on any byte boundary and with any length as LZ1 does but instead, when a dictionary word is matched by a source string, adds a new word to the dictionary that is the matched word plus the following source string byte. Matches are coded as pointers or indexes to the words in the dictionary. Terry A. Welch ("A Technique for High Performance Data Compression", IEEE Computer, pp. 8-19, June 1984) later refined the LZ2 process to create the popular Ziv-Lempel-Welch data compression process, commonly denominated the "LZW" process. The LZW process is also disclosed in U.S. Pat. No. 4,558,302 issued to Welch and assigned to Unisys Corporation.
The art is replete with improvements to the LZ2 and LZW data compression processes, primarily because of their relatively easy encoder implementation. For instance, Miller et al. ("Variation on a Theme by Ziv and Lempel", IBM Research Report RC10630, Jul. 31, 1984) modify and augment the LZ2 procedure to improve the compression ratio and better control the size of the encoding dictionary. These improvements are also disclosed in U.S. Pat. No. 4,814,746 issued to Miller et al. and assigned to International Business Machines Corporation. Similarly, in U.S. Pat. No. 4,464,650, Eastman et al. later disclose a LZ2 improvement related to input data stream parsing. In U.S. Pat. No. 5,087,913, Eastman later discloses additional LZ2 improvements related to dictionary building. Other LZW data compression process improvements are disclosed in U.S. Pat. Nos. 4,876,541, 5,150,119, 5,151,697 and 5,153,591, to mention a few.
The LZ2 and LZW data compression encoding procedures are easy to implement because the dictionary contents are built adaptively merely by adding a new word representing an old word extended by one new byte. Given the resulting dictionary, anyone can decode the compressed data. Disadvantages of the LZ2 process include relatively slow encode and decode speeds, a dictionary reset requirement or similar costly steps to ensure continuing adaption to long source strings with limited dictionary size, and limitations on usable match boundaries.
There apparently was less interest at first in the original LZ1 data compression process, perhaps because of encoder complexity. Interestingly, in an early U.S. Pat. No. 4,054,951, Jackson et al. disclose a "data expansion apparatus" that encodes for storage repeated data strings in a long data stream in terms of a tag, address, length and repetition count. Although the Jackson et al. patent application was filed before publication of the above-cited Ziv et al. references, the LZ1 data compression process reads on the Jackson et al. claims.
More recently, in U.S. Pat. No. 5,003,307, Whiting et al. disclose a ZV1 data compression apparatus with a shift register search means useful for implementing the difficult string-locating procedure required by the ZV1 data compression process. The Whiting et al. patent describes the basis for the 9703/9704 data compression co-processor chip produced by Stac Electronics (Robert Lutz, "9703/9704 Design Guide", Stac Electronics Application Note APP-0006, Stac Electronics, Carlsbad, Calif., July 1990) and the QIC development standard for data cartridge tape drives ("Data Compression Format for 1/4-Inch Data Cartridge Tape Drives", Development Standard QIC-122, rev. B, Feb. 6, 1991, Quarter-Inch Cartridge Drive Standards, Inc., Santa Barbara, Calif.). The success of the Whiting et al. LZ1 embodiment and the continuing need for high compression and decompression speeds have stimulated recent interest in the LZ1 process.
Because LZ1 scrolls the source string over a fixed-size "history" window to create the "dictionary", identification of duplicate "matching" strings in the source data is difficult, but once done, very efficient. Matches are not necessarily limited to predictable byte boundaries and, for instance, back-to-back matches may hop around in the history window. However, once a matching string is encoded as a "length" and "offset", the necessary decoding process is rapid and efficient, requiring no dictionary preload. LZ2 and LZW decompression processes are both of similar complexity to the LZ2 compression process and seriously limited in speed by dictionary preload and reset requirements.
Reference is made to Daniel Helman ("General Purpose Data Compression ICs", IEEE 1991, pp. 344-348, 1991) and Kent Anderson ("Methods Of Data Compression After The Manner Of Lempel And Ziv", Optical Information Systems, pp. 40-43, January-February 1990) for comparative discussions of the LZ1 and LZ2 data compression processes. The important differences between LZ1 and LZ2 arise from the LZ2 restriction of the match offset to positions in the dictionary tree linked to previous matches. Although this LZ2 restriction greatly simplifies the matching encoder, it also increases storage requirements for both encoder and decoder and requires a LZ2 decoder that is substantially more complex than the equivalent LZ1 decoder. Thus, the LZ1 process is better suited to any compression system that must achieve high throughput rates in both directions.
All adaptive dictionary-based data compression processes suffer from what may be called "start-up losses" in compression efficiency. Because each source string or block begins with an empty "dictionary", the first source symbols must be passed through as raw bytes without compression. After accumulating a substantial dictionary, matches are found for increasing numbers of source sub-strings and encoded to gradually build up compression efficiency to long-term levels.
Another well-known problem with all data compression processes is the increase in potential error-correction problems resulting from loss of redundancy in the data. Because compression removes redundancy from the source symbol stream, the compressed output data stream is quite vulnerable to data corruption arising from bit errors that cannot be corrected because of the lack of redundancy necessary for such correction.
Accordingly, there is a clearly-felt need in the art for a data compression system that provides the high throughput of an LZ1 process with improved compression efficiency and very high reliability. The related unresolved problems and deficiencies are clearly felt in the art and are solved by this invention in the manner described below.