Compressing digital input data into output data at a high compression ratio and high compression rate is increasingly valuable for efficiently operating computers, transmitting and transferring data over communication channels, and for storing data.
This invention relates to compressing a stream of input data into a compressed stream of output data. In particular, the system is for use, for instance, with computers, modems, data storage techniques, and data transmission and display.
Lossless data compression involves the process of transforming a body of data to a typically smaller representation from which the original can be reconstructed at a later time. Thus data that is compressed and then subsequently decompressed must always be identical to the original.
Ideally, data compression should be at a high compression ratio and/or a high compression rate. The use of a minimum amount of storage and steps to effect compression usually is required to achieve these objectives.
The input data for compression is represented as a sequence of characters drawn from some alphabet. An alphabet is a finite set containing at least one element. The elements of an alphabet are characters. A string over an alphabet is a sequence of characters, each of which is an element of that alphabet.
A common approach to compressing a string of characters is textual substitution. A textual substitution data compression method compresses text by identifying repeated substrings and replacing some substrings by references to other copies. Such a reference is commonly known as a pointer and the string to which the pointer refers is called a target. Therefore, in general, the input to a data compression method employing textual substitution is a sequence of characters over some alphabet and the output is a sequence of characters from the alphabet interspersed with pointers.
Various data compression systems are known which utilize special purpose compression methods designed for compressing special classes of data. The major drawback to such systems is that they only work well with the special class of data for which they were designed and are inefficient when used with other types of data.
Processing the input data is effected by viewing the input data in minimum subblock sizes. Thus, the input data can be considered in subblock sizes of three, four or other suitable number of bytes of input data at any one time. The determination of the correct input data subblock size impacts the rate and ratio of data compression.
One known compression technique is the Lempel-Ziv method. One such method maps variable-length segments of symbols into various length binary words. A problem with this method is that the required memory space grows at a non-linear rate with respect to the input data.