1. Field of the Invention
The present invention relates to data compression and decompression.
2. Description of Related Art
Data compression is the reversible transformation of information into a more compact representation. This more compact representation permits the information to be stored and/or communicated more efficiently, generally saving both time and expense.
A major class of compression schemes encodes multiple-character strings using binary sequences or "codewords" not otherwise used to encode individual characters. The strings are composed of an "alphabet," or single-character strings. This alphabet represents the smallest unique piece of information the compressor processes. Thus, an algorithm which uses eight bits to represent its characters has 256 unique characters in its alphabet. Compression is effected to the degree that the multiple-character strings represented in the encoding scheme are encountered in a given file or data stream. By analogy with bilingual dictionaries used to translate between human languages, the device that embodies the mapping between uncompressed code and compressed code is commonly referred to as a "dictionary."
Generally, the usefulness of a dictionary-based compression scheme is dependent on the frequency with which the dictionary entries for multiple-character strings are used. A fixed dictionary optimized for one file type is unlikely to be optimized for another. For example, a dictionary which includes a large number of character combinations likely to be found in newspaper text files is unlikely to compress efficiently data base files, spreadsheet files, bit-mapped graphics files, computer-aided design files, et cetera.
Adaptive compression schemes are known in which the dictionary used to compress a given file is developed while that file is being compressed. Codewords representing every single character possible in the uncompressed input file are put into the dictionary. Additional entries are added to the dictionary as multiple-character strings are encountered in the file. The additional dictionary entries are used to encode subsequent occurrences of the multiple-character strings.
During decompression, the dictionary is built in a like manner. Thus, when a codeword for a character string is encountered in the compressed file, the dictionary contains the necessary information to reconstruct the corresponding character string. Compression is effected to the extent that the multiple-character strings occurring most frequently in the file are encountered as the dictionary is developing.
Adaptive compression systems and methods are disclosed in U.S. Pat. No. 4,814,746 to Miller et al. and U.S. Pat. No. 4,558,302 to Welch. These references further explain the use of dictionaries.
To avoid keeping a dictionary which no longer represents the type of data being compressed, and thus exhibits the problems described above in reference to fixed-dictionary compression schemes, the dictionary is periodically reset. These resets can occur at natural data boundaries, e.g., at the beginning of files, after a fixed amount of data has been compressed, or according to a performance-based algorithm as described in Kato, U.S. Pat. No. 4,847,619 and which is hereby incorporated by reference. These dictionary resets take time. In many instances, the reset-caused time delay will cause other resources to wait.
A typical hardware implementation of a data compression algorithm is a data compression (DC) integrated circuit (IC) which is inserted into a data stream. A data compression machine on the IC processes the data stream present at one port and routes the resulting compressed or decompressed data stream to the other port.
A DC IC may use an external memory IC, or "dictionary RAM," to store the dictionary entries used in compression and decompression. The predetermined codeword entries representing the alphabet are written to the dictionary RAM. Each time the compressor matches only a single character (a string of length 1), the codeword for the character is read from the dictionary RAM and output in the compressed stream. When the decompressor receives a codeword for a single character, it must read the dictionary RAM to get the alphabet character.
A pair of direct memory access (DMA) interfaces on the DC IC may handle the specifics of the IC's communication with the outside world. Each DMA interface has a RAM buffer for storing data yet to be passed on. For cost reasons, a single RAM IC may be used for the dictionary storage and both DMA buffers. Because the DMA interfaces operate in parallel, it is possible that both will attempt to access the single RAM IC at the same time. Although an arbitration circuit can manage such conflicts, such arbitration cannot prevent one circuit from waiting while the other circuit is given priority.