The present invention is generally directed to systems and methods for compressing data. More particularly, the present invention is directed to a system and method for processing input character streams received by a data processing circuit or software-driven system. Even more particularly, the present invention is directed to a circuit and method for comparing input data stream characters so that even in the case of a character mismatch, character level processing still occurs in furtherance of data compression. In particular, processing occurs so as to eliminate timing and processing rate dependencies that occur as a result of variations in the input data stream. In particular, the present invention eliminates data compression processing rate dependencies which would normally occur due to the specific content of the received input data stream.
Data compression is a process which is carried out to reduce the number of bits of information that are employed to represent information in an input stream of characters. Data compression is possible because of the existence of certain patterns of information representation that occur in the input stream. In particular, certain strings of characters may appear in a plurality of locations within the data and it is sufficient to identify such repeated strings merely by their position within the stream. By taking advantage of such redundancies in the information flow, it is possible to represent the exact same information using fewer bits of data.
The importance of data compression arises for two primary reasons. Firstly, when data is represented in a compressed form, it takes less time to transmit this information from point A to point B. Secondly, data compression also permits the same data to be stored in fewer memory locations whether these memory locations be located in a random access memory or on a storage medium such as a magnetic disk drive, floppy disk, optical disk, or other fixed medium. Accordingly, compressed data is data that may be transmitted more quickly and stored more efficiently.
Additionally, data compression becomes that much more desirable when the implementing circuits and processes are fast and efficient. That is, if the data compression operation takes an undesirably long time, then at least one of the main benefits of data compression (time savings) is lost or at least negatively impacted. Accordingly, it is therefore desirable to be able to compress information in as rapid a fashion as possible.
The system and method of data compression enhancement described herein is applicable to any system or algorithm which employs mechanisms for locating and counting the length of character strings received which match already received character strings. Such systems are described in the American National Standard ANSI X.3.241-1994 titled xe2x80x9cData Compression Methodxe2x80x94Adaptive Coding with Sliding Window for Information Interchange.xe2x80x9d The system descrived in the aformentioned standard, is also related to the data compression method shown in U.S. Pat. No. 5,003,307. This patent describes a method for comparing each input character to any character already stored in a history buffer. This patent particularly describes one way of accumulating the result of data search in order to find the longest matching pattern. However, the data rate of the method described in the aforementioned patent is dependent upon the data received. If the data contains long repetitive patterns, the perforemance charactertics of the data compression engine are very good but they decline, however, as soon the matching patterns become short.
The reason for this performance dependency stems from the fact that each character in the received input stream is evaluated as a potential nth character of a matching character string. When n=1, this means that this is the start of a potential matching string. The compression algorithm described therein identifies each matching string of at least two characters. However, every time a mismatch is found, the accumulation logic that keeps track of the past search results is reset. In this case, the character that causes the mismatch is reevaluated as a starting character of a new string, as long as n is greater than 1. This activity causes a loss of one or two clock cycles per mismatch. It is thus clear that the data rate decreases with the number of mismatches. The present invention, however, does not suffer from this inadequacy.
In preferred embodiments of the present invention, each character from the received string of input characters is compared with all of the characters which have so far been received in the input stream. In addition, besides accumulating the result processing the current string, a comparison is also carried out as if the current comparison was the first character of a potential string that had not yet been identified in a match, that is, as a non-accumulated result. Thus, at each character input, a bifurcated set of data is stored. At those times that a mismatch is found, the additionally accumulated results are employed instead to evaluate the next character in the receiving input stream to determine whether or not this is the first character of a new (as yet unidentified) input string. By doing this, it is possible to process one character per clock cycle without any dependency on the data rate from the particular data pattern. The roles of the bifurcated stored results are reversed in this way whenever a mismatch occurs.
The present invention embodies several different aspects. In one aspect, it embodies a complete data compression engine. In another aspect, it embodies a preprocessing circuit for supplying data to a data compressing engine. In another aspect, the present invention is directed to a method for a processing input data sequences to relieve the data compression operation of dependence on specific data. And yet another aspect of the present invention is directed to a program product in which the described method is embodied in software.
Accordingly, it is an object of the present invention to provide an improved system and method for data compression.
It is a further object of the present invention to provide a method for data compression in which the speed of the compression is not as dependent upon data content as compared with prior approaches.
It is a still further object of the present invention to provide a circuit which performs preprocessing of data to efficiently generate string match, position, and length indications to data formatting mechanisms for producing compressed data output.
It is a yet another object of the present invention to eliminate problems that result when currently employed string data byte compare operations indicate a non-match.
It is a still further object of the present invention to provide a data compression system, method, and apparatus which are consistent with conventionally employed standards for data compression.
It is also an object of the present invention to provide an improved complete data compression engine.
Lastly, but not limited hereto, it is an object of the present invention to provide a data compression system and method for use with character strings of any fixed length.
The recitation herein of a list of desirable objects which are met by various embodiments of the present invention is not meant to imply or suggest that any or all of these objects are present as essential features, either individually or collectively, in the most general embodiment of the present invention or in any of its more specific embodiments.