The present invention relates to signal processing and, more particularly, to data compression.
Data compression is the reversible re-encoding of information into a more compact expression. This more compact expression permits information to be stored and/or communicated more efficiently, generally saving both time and expense. A typical encoding scheme, e.g., based on ASCII, encodes alphanumeric characters and other symbols into binary sequences. A major class of compression schemes encodes symbol combinations using binary sequences not otherwise used to encode individual symbols. Compression is effected to the degree that the symbol combinations represented in the encoding scheme are encountered in a given text or other file. By analogy with bilingual dictionaries used to translate between human languages, the device that embodies the mapping of uncompressed code into compressed code is commonly referred to as a "dictionary".
The present invention is primarily applicable to dictionary-based compression schemes, which are part of a larger class of sequential compression schemes. These are contrasted with non-sequential schemes which examine an entire file before determining the encoding to be used. Other sequential compression schemes, such as run-length limited (RLL) compression, can be used in conjunction with adaptive schemes.
Generally, the usefulness of a dictionary-based compression scheme is dependent on the frequency with which the symbol-combination entries in the dictionary are matched as a given file is being compressed. A dictionary optimized for one file type is unlikely to be optimized for another. For example, a dictionary which includes a large number of symbol combinations likely to be found in newspaper text files is unlikely to compress effectively data base files, spreadsheet files, bit-mapped graphics files, computer-aided design files, Musical Instrument Data Interface (MIDI) files, etc.
Thus, a strategy using a single fixed dictionary might be best tied to a single application program. A more sophisticated strategy can incorporate means for identifying file types and selecting among a predetermined set of dictionaries accordingly. Even the more sophisticated fixed dictionary schemes are limited by the requirement that a file to be compressed must be matched to one of a limited number of dictionaries. Furthermore, there is no widely accepted standard for identifying file types essentially limiting multiple dictionary schemes to specific applications or manufacturers.
Adaptive compression schemes are known in which the dictionary used to compress a given file is developed as that file is being compressed. Entries are made into a dictionary as symbol combinatios are encountered in the file. The entries are used on subsequent occurrences of an encoded combination. Compression is effected to the extent that the symbol combinations occurring most frequently in the file are encountered as the dictionary is developing. Systems incorporating adaptive compression schemes can include means for clearing the dictionary between files so that the dictionary can be adapted on a file-by-file basis.
Adaptive compression systems and methods are disclosed in U.S. Pat. No. 4,464,650 to Eastman et al. and U.S. Pat. No. 4,558,302 to Welch. These references explain further the use of dictionaries in both adaptive and non-adaptive compression strategies. Further pertinent references to compression strategies include: G. Herd, "Data Compression: Techniques and Applications--Hardware and Software Considerations, Wiley, 1983; R. G. Gallagher, "Variations on a Theme of Huffman", IEEE Transactions on Information Theory, Vol. IT-24, No. 6, pp. 668-674, November 1978; J. Ziv and A. Lampel, "A Universal Algorithm for Sequential Data Compression", IEEE Transactions on Information Theory, Vol. IT-23, No. 3, pp. 337-343, May 1977; J. Ziv and A. Lampel, "Compression of Individual Sequences via Variable Rate Coding", IEEE Transactions of Information Theory, Vol. IT-24, No. 5, pp. 530-536, September 1978; and T. A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, June 1984.
A disadvantage of such adaptive compression techniques is that in some cases they can expand rather than compress the data. In fact expansion is the rule rather than the exception when an adaptive compression scheme is used to compress a file which has already been compressed by that scheme. As data compression becomes more widely employed, the chances of data expansion due to an attempted compression of a previously compressed file increases. For example, an application program can include a dedicated compression scheme so that files created by the program can be stored efficiently on a hard disk drive. Likewise, a tape drive system for backing up a hard disk include a data compression scheme in hardware for more efficient archiving of the hard disk drive. In this situation, attempting data compression during archiving can result in data expansion rather than contraction.
As data compression becomes more common, this counterproductive scenario becomes less the exception and more the rule. If data compression is to be implemented in hardware so that it operates irrespective of the type of data being compressed, it becomes necessary to protect against unintended data expansion. Of course, this protection must not interfere with the process of decompression that must occur upon the reception or retrieval of compressed data.