1. Field of the Invention
The present invention relates generally to data compressors and decompressors, and more particularly to a fast data compressor performing a new compression method and to a fast data decompressor performing a corresponding new decompression method.
2. Description of the Background
It is well known in the computer arts that given data occupying a given number of bytes may be fully represented by other data occupying a smaller number of bytes. In other words, the given data may be compressible. In order for compression to be useful, there must exist a method of reconstructing the original data. In other words, the compressed data must be decompressible.
Data compression is useful in a variety of circumstances. For example, compressed data will occupy less storage on a permanent storage medium such as a hard disk or a floppy disk, thereby permitting the disk to effectively hold an increased amount of data. This eliminates the expense otherwise required in purchasing a larger hard disk or more floppy disks.
As another example, compressed data will take less time to transfer over a communication link. This reduces expenses associated with data transmission, such as long-distance telephone charges, bulletin board use charges, bulletin board connect time, and user wait time.
An increasingly more important example is the computer network. Networked computers and servers transfer data amongst themselves in the form of "packets" of a given number of bytes. Increasing the effective data content of those bytes increases the packet content and the network throughput.
Yet another example is a computer system having virtual memory. The virtual memory system is constructed to allow user programs and the like to have seeming access to a greater quantum of random access memory (RAM) than the computer system is actually equipped with. At times, certain memory pages may need to be "swapped out" to non-RAM storage to make room for new data in RAM. Typically, the pages are swapped out to a hard disk. If the pages are compressed before being swapped out, they will not only occupy less space on the hard disk, but will be written to the disk in a shorter amount of time, and will also be read from the disk (swapped in) in a shorter amount of time, as well. Ideally, the compression and decompression will operate in a shorter amount of time than the decrease in transmission time.
As another example, data may be stored in a read-only memory (ROM) in a compressed form, and may be decompressed in part or in whole when the ROM is read.
Data compression and decompression may be performed in a variety of manners. For example, a stand-alone program may be used to compact files on a hard disk whenever the hard disk begins to become full. Then, when the compacted files are needed at some future date, a corresponding decompression program may be used to decompress the desired files.
Alternatively, the compression and decompression may be performed transparently by a "driver" which is used to access the hard disk. The compressor and decompressor are, in effect, a "front end" of the hard disk. In such a configuration, the disk may advantageously store only compressed data. The user application or other entity reading data from, or writing data to the hard disk need never know that the compression and decompression are taking place. This front end technique may be performed either in software or, more advantageously, in hardware. The front end may also be built into either the computer or the hard disk drive.
A variety of compression and decompression techniques are well-known in the art, including such techniques as: Huffman encoding, Lempel-Ziv encoding, and Run Length Encoding (RLE). Those techniques have certain drawbacks which are desirable to overcome. For example, they are generally non-self-recovering, meaning that if their compressed data becomes corrupted by even one "bit", the entire data will likely be irretrievably corrupted upon decompression. Even worse, such errors may cause "data explosion" in which the decompressor is unable to determine when to stop decompressing and generates erroneous data forever, or until the computer runs out of memory or otherwise crashes.
More significantly, the previously-known techniques have proven to be unacceptably slow, both in compressing and in decompressing data. Those skilled in the art are aware that data files (including text files, graphics files, stand-alone programs, and the like) are, on the average, much larger today than at the time the previously-known techniques were conceived. The larger data files simply take too long to compress and decompress using the previously-known techniques.
Data compression typically takes advantage of the fact that most data contains repeated strings of bytes. Prior techniques maintain "dictionaries" of previously-encountered strings of bytes. In some schemes, the dictionaries are embedded in the compressed data file, to enable the decompression of the data. Other schemes reconstruct the dictionaries at decompression time. Storage and transfer of the dictionaries are wasteful of media and bandwidth. Dictionary reconstruction is an unacceptably severe performance penalty to pay at each decompression.
In order to determine whether a given string has previously been encountered, prior techniques generally employ some sort of hashing function to access the dictionary. Hashing consumes computational resources, and slows down both the compression and the decompression of data.
Presently, data compression and decompression come in two varieties: "loss-less" and "lossy". The less desirable "lossy" compression is used in situations where absolute data integrity is not required. For example, compressed video images may be acceptably accurate when restored in a "lossy" form, in which a small degree of their detail or color variation has been lost. "Lossy" compression is generally deemed acceptable when the lost data is outweighed in importance by increased compression ratios or by improved compression or decompression times.
By way of contrast, the "loss-less" variety is employed when absolute data integrity outweighs other such factors. For example, when a user application program is to be compressed for storage or transmission and then decompressed, it is absolutely essential that the compression and decompression be one hundred percent "loss-less". The loss or corruption of even a single machine instruction may render the restored version useless at best, and perhaps even dangerous.
Therefore, what is needed is an improved data compressor and an improved data decompressor. These should have improved throughput, and should be self-recovering in the event of data corruption. It is desirable that the compressor and decompressor operate without the creation of a separate dictionary, and even more desirable that no dictionary be embedded in the compressed data file. It is further desirable that neither the compressor nor the decompressor requires a hashing function. It is further desirable that the compressor and decompressor operate to perform "loss-less" compression and decompression, and still further desirable that the compression and decompression be performed at an improved speed.