1. Field of the Invention
The invention relates generally to data storage and communication systems and, more particularly, to a comparing and prioritizing memory that allows data stored therein to be string searched, wherein the results of the string search are used for data compression.
2. Related Art
Many tasks such as data compression and database searches, and many optimization techniques involve searching a data buffer for strings that match a given string. The speed of these tasks (e.g., data compression) is often directly proportional to the speed at which these matching strings can be located.
Finding the set of strings that match a given string is a computationally intense task. Typically, to find the strings that match a given string of length M in a buffer of length N, a maximum of M*N comparisons need to be performed. If these comparisons occur sequentially, a maximum of M*N cycles are required. However, if many of these comparisons could be performed in parallel, say N comparisons at a time, a maximum of only M cycles would be required.
There are a number of data compression algorithms that involve searching for matching strings that have good compression characteristics. However, these algorithms are typically not implemented because they are too computationally intense. Examples are the original Ziv and Lempel compression algorithm LZ1 (see, Jacob Ziv and Abraham Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Transactions on Information Theory, Vol. IT-23, No. 3, May 1977, pp. 337-343) and a compression algorithm by James Storer and Thomas Szymanski (see, James A. Storer and Thomas G. Szymanski, "Data Compression via Textual Substitution," Journal of the Association for Computing Machinery, Vol. 29, No. 4, October 1982, pp. 928-951). Both algorithms are based on the principal of finding redundant strings within a search window and encoding them with pointers that contain the length and location of that string in the search window.
A second data compression algorithm of Ziv and Lempel LZ2 (see, Jacob Ziv and Abraham Lempel, "Compression of Individual Sequences via Variable-Rate Coding," IEEE Transactions on Information Theory, Vol. IT-24, No. 5, September 1978, pp. 530-536), provides variable length to variable length encoding. It continuously compares the input stream against words contained in a dictionary, and returns pointers to the dictionary entry of the longest match. The dictionary growth heuristic implied by the addition of the last parsed word concatenated with the first unmatched symbol causes the dictionary to contain every prefix of every word it holds. Although implementation of this algorithm is not computationally intensive, it yields a lower compression ratio than the original Ziv and Lempel compression algorithm.
Other string search methods employ hashing or tree techniques. In hashing, a hashing table is maintained to limit the locations that are searched. However, in the case where many strings start with the same sequence, many comparisons still need to be made, resulting in performance degradation.
Tree based systems maintain a tree structure that is traversed to identify the existence and location of strings in the history buffer. The disadvantage of tree based systems is the overhead of adding and deleting tree entries which can significantly degrade performance.
An apparatus that is often used to speed up searches is a Content Addressable Memory (CAM). A CAM can return the location of a given word in a single memory access. A word is presented to the CAM, and the CAM performs a simultaneous compare between the given word and all locations in the CAM. If the word is present in any of its locations, the CAM returns the address of one of the matched locations. This makes the CAM an excellent apparatus to search for a single word in a buffer. However, it is not possible to use a CAM to search strings that are longer than a single word.