1. Field of the Invention
The present invention relates to an apparatus and method for processing data signals wherein the data signals are compressed and subsequently reconstituted. Data compression involves the process of transforming a body of data to a typically smaller representation from which the original can be computed at a later time. The field of the present invention further relates to lossless data compression, wherein data that is compressed and then subsequently decompressed must always be identical to the original. The field of the present invention further relates to compression of digital data which is data that is represented as a sequence of characters drawn from some alphabet.
2. Description of the Prior Art
Several methods and apparatus for performing said methods are known in the prior art to compress data signals and subsequently reconstitute them. An alphabet is a finite set containing at least one element. The elements of an alphabet are called characters. A string over an alphabet is a sequence of characters, each of which is an element of that alphabet. A common approach to compressing a string of characters is textual substitution. A textual substitution data compression method is any compression method that compresses text by identifying repeated substrings and replacing some substrings by references to other copies. Such a reference is commonly known as a pointer and the string to which the pointer refers is called a target. Therefore, in general, the input to a data compression algorithm employing textual substitution is a sequence of characters over some alphabet and the output is a sequence of characters from the alphabet interspersed with pointers. The following patents are representative known prior art compression methods and apparatus:
1. U.S. Pat. No. 4,464,650 issued to Eastman et al on Aug. 7, 1984 for "Apparatus And Method For Compressing Data Signals And Restoring The Compressed Data Signals".
2. U.S. Pat. No. 4,558,302 issued to Welch on Dec. 10, 1985 for "High Speed Data Compression And Decompression Apparatus And Method".
3. U.S. Pat. No. 4,586,027 issued to Tsykiyama et al. on Apr. 29, 1986 for "Method And System For Data Compression And Restoration".
4. U.S. Pat. No. 4,560,976 issued to Finn on Dec. 24, 1985 for "Data Compression".
5. U.S. Pat. No. 3,914,586 issued to McIntosh on Oct. 21, 1975 for "Data Compression Method And Apparatus".
6. U.S. Pat. No. 4,682,150 issued to Mathes et al. on July 21, 1987 for "Data Compression Method And Apparatus".
7. U.S. Pat. No. 4,872,009 issued to Tsukiyama et al. on Oct. 3, 1989 for "Method And Apparatus For Data Compression And Restoration".
8. U.S. Pat. No. 4,758,899 issued to Tsukiyama on July 19, 1988 for "Data Compression Control Device".
9. U.S. Pat. No. 4,809,350 issued to Shimoni et al. on Feb. 28, 1989 for "Data Compression System".
10. U.S. Pat. No. 4,087,788 issued to Johannesson on May 2, 1978 for "Data Compression System".
11. U.S. Pat. No. 4,677,649 issued to Kunishi et al. on June 30, 1987 for "Data Receiving Apparatus".
In general, as illustrated by the above patents, data compression systems are known in the prior art that encode a stream of digital data signals into compressed digital code signals and decode the compressed digital code signals back into the original data. Various data compression systems are known in the art which utilize special purpose compression methods designed for compressing special classes of data. The major drawback to such systems is that they only work well with the special class of data for which they were designed and are very inefficient when used with other types of data. The following compression systems are considered general purpose.
The best known and most widely used general purpose data compression procedure is the Huffman method. The Huffman procedure maps fixed length segments of symbols into variable length words. The Huffman procedure involves calculating probabilities of the occurrences of certain symbols and establishing a tree having leaves for symbols with certain probabilities and new nodes established from lower probability symbols which nodes are also placed on the tree. The Huffman data compression procedures have many limitations. Huffman encoding requires prior knowledge of the statistical characteristics of the source data. This is cumbersome and requires considerable working memory space. In addition, Huffman requires intensive calculations for variable bit compression. Also, Huffman requires a dictionary in the output stream for reconstruction of the digital signal or requires a prior knowledge of the dictionary which limits the applicability to specific types of data.
A second well known compression technique is the Tunstall algorithm which maps variable length segments of symbols into fixed length binary words. Tunstall also has many of the disadvantages of the Huffman method and further has the constraint that the output string consists of fixed-length binary words.
The third well known compression technique is the Lempel-Ziv method. One such method maps variable-length segments of symbols into various length binary words. A problem with this method is that the required memory space grows at a non-linear rate with respect to the input data. An improved variation of the Lempel-Ziv method is disclosed and claimed in Eastman U.S. Pat. No. 4,464,650. This new method has several major disadvantages. First, the method requires the creation of a searchtree database and therefore requires storage room for the dictionary. Second, the amount of achievable compression is heavily dependent on the dictionary. Third, management and searching of the dictionary is time consuming, yielding low data rate-compression factor product. Fourth, the growth characteristics of the dictionary requires N-1 input data string occurrences of string of length N in order to establish string in the dictionary. This results in reduced compression efficiency. Fifth, in the worst case, the growth of output data block is tied directly to the size of the dictionary. Making the dictionary larger can improve overall compression for compressible data, but yield larger percentage growths for incompressible data because more bits are required to represent fixed length dictionary pointers. Finally, the dictionary must be reconstructed during expansion, resulting in a slower reconstitution rate and more required memory space.
U.S. Pat. No. 4,558,302 issued to Welch is very similar to the Lempel-Ziv method described in U.S. Pat. No. 4,464,650 and also includes all of the basic problems of that method. The basic difference is that instead of storing the dictionary in a tree node type structure, Welch is explicitly compressing an input stream of data character signals by storing in a string table strings of data character signals encountered in the input streams. This has the additional disadvantage of requiring more storage than the Lempel-Ziv method. While it does provide the advantage of being faster if the number of strings that must be searched is small; it still has the poor dictionary growth characteristics of Lempel-Ziv.
The remaining patents which discuss compression algorithms include in the process the requirement of creating a dictionary, either in the form of a tree or a series of strings or similar arrangement which requires substantial memory and storage for the dictionary or the strings and the time consuming process of searching the dictionary, yielding a low data rate-compression factor product. Therefore, there is a significant need for an improved apparatus and method for compressing data which eliminates all of the problems discussed above and provides a faster and more efficient method of compressing the data while at the same time retaining most of the advantages of prior systems.