1. Technical Field
The present invention relates generally to a method, system, and apparatus for data compression and, more particularly, to a method, system and apparatus for lossless data compression.
2. Description of Related Art
Data compression is the process of encoding data to take up less storage space. Digital data is compressed by finding repeatable patterns of binary 0s and 1s. The more patterns can be found, the more the data can be compressed. Text can generally be compressed to about 40% of its original size, and graphics files from 20% to 90%. Some files compress very little. It depends entirely on the type of file and compression algorithm used.
There are numerous compression methods in use. Two major technologies are Huffman coding and Lempel-Ziv-Welch (LZW), representing examples of the statistical and dictionary compression methods.
When a compression algorithm is packaged for use for a specific platform and file format, it is called a codec (compressor/decompressor). ADPCM, PCM and GSM are examples of codecs for sound, and Indeo, Cinepak and MPEG are examples of codecs for video.
In the DOS/Windows world, PKZIP is the most widely-used compression application.
When text and financial data are compressed, they must be decompressed back to a perfect original, bit for bit. This is known as lossless compression. However, audio and video can be compressed to as little as 5% of its original size using lossy compression. Some of the data is actually lost, but the loss is not noticeable to the human ear and eye.
One method of data compression is to write data words to sequential content addressable memory (CAM) locations. CAMs are memory storage devices that are accessed by comparing the content of the data stored in it rather than by addressing predetermined memory locations within the storage device. While a word is being written to a CAM, it is also compared to previously written words within the CAM. Multiple sequential matches are compressed using logic such as discussed above. Traditionally, since the compare and write steps utilize the same data, the CAM word being written is not used in determining a match. The result is that there is always one CAM word not available for matching. Therefore, a system, method, and apparatus for making all CAM words available for matching is desirable. Such functionality statistically increases the likelihood of matching the current CAM location, thus more efficiently compressing the data. The smaller the CAM size, the larger the increase in the likelihood of matching the current CAM location that will be provided by making all CAM words available for matching.
The present invention provides a method, system, and apparatus for making all content addressable memory words available for comparison by a data compressor. In one embodiment, new data, to be compared with old data, is launched into a master latch. The new data from the master latch is launched into both a slave latch and compare logic for each of a plurality of content addressable memory words within a content addressable memory (CAM). After the comparison has been made between the new data and the old data contained within the CAM word, the new data from the slave latch is launched into the one of the plurality of content addressable memory words. Thus, each CAM word, including the CAM word that will be overwritten by the new data, is available for comparison to the new data.