This invention relates generally to data compression and decompression methods and apparatus, and more particularly to implementations of lossless data compression algorithms which use a dictionary to store compression and decompression information.
One widely-used example of a compression algorithm that uses a dictionary to store compression and decompression information is the second method of Lempel and Ziv, called LZ2. The dictionary is first initialized or reset. It is then built, by creating valid dictionary entries, as the incoming data is compressed/decompressed. Once a dictionary entry is created, it remains valid, until the entire dictionary is reset. An earlier method, LZ1, only maintains a finite most recent subset of entries. These methods are disclosed in U.S. Pat. No. 4,464,650 to Eastman et al., and various improvements in the algorithms are disclosed in U.S. Pat. Nos. 4,558,302 to Welch and 4,814,746 to Miller et al.
Integrated circuit implementations of the LZ2 algorithm typically store the dictionary in static RAM, as shown in FIG. 1 and described in further detail in the AHA3101 Data Compression Coprocessor IC Product Specification published November 1990 by Advanced Hardware Architectures, Inc., Moscow ID. The circuit of FIG. 1 has two parts--a data compressor engine implemented in an integrated circuit (IC) 5, and a static random access memory (RAM) 6. During compression, the data compression engine 7 reads uncompressed data on the DATA IN port. This data is compressed by the data compression engine, which uses the dictionary in the static RAM as part of the compression process. The data compression engine outputs compressed data on the DATA OUT port.
The process of data compression involves matching a sequence of input data bytes with the same sequence already encoded in a valid dictionary location. When this occurs, a compressed code word is output in place of the uncompressed sequence of input data bytes. In order to increase data throughput performance, a hashing algorithm is used to perform the matching function. In order to have an efficient hashing algorithm, the number of valid dictionary locations in the static RAM is much larger than the maximum number of valid dictionary entries. A factor of two to four is typical.
The data compression engine 7 provides the address for the dictionary location (1, 2, 3, . . . L) to be accessed, as well as the read and write control signals to the static RAM. Each dictionary location contains two fields. The first field 8 contains the dictionary entry information, and is called DICT.sub.-- ENTRY. The DICT.sub.-- ENTRY field stores the sequences of input bytes already encountered, and their corresponding compressed codeword. The second field 9 is called the DICT.sub.-- VALID field, and is conventionally a one-bit field. This field designates a dictionary location as either in the reset state, or in the valid state to indicate whether the DICT.sub.-- ENTRY field 8 contains a valid dictionary entry.
When the DICT.sub.-- VALID field 9 is in the reset state, the contents of the DICT.sub.-- ENTRY field 8 are undefined, and do not contain useful information for the data compression engine. When the DICT.sub.-- VALID field is in the valid state, the contents of the DICT.sub.-- ENTRY field have already been written by the data compression engine. When the data compression engine writes to a dictionary location, the DICT.sub.-- VALID field is always set to the valid state.
The statistical characteristics of the data may change over time, such as when different kinds of data are transmitted in succession, or over the course of transmitting a long document. When this happens, the stored dictionary entries will no longer efficiently compress the data. Then, it becomes necessary to reset and rebuild the dictionary. Commonly-assigned U.S. Pat. No. 4,847,619 discloses a method for monitoring compression efficiency and triggering a reset when performance falls below a predetermined threshold.
The process of resetting the dictionary involves writing to all dictionary locations, to put the DICT.sub.-- VALID bit 9 of each location in the reset state. This must be done every time that the dictionary is reset. Let the number of dictionary locations be called L. Therefore, the number of static RAM write operations is also L. The amount of time to perform a total of L static RAM write operations is significant. This operation degrades the average data throughput during compression and decompression sequences involving multiple dictionary resets.
Accordingly, a need remains for a way to improve the performance of dictionary-based data compression engines.