1. Field of the Invention
The invention relates to data compression systems based on the LZ data compression methodology and more particularly on the LZW protocols.
2. Description of the Prior Art
Professors Abraham Lempel and Jacob Ziv provided the theoretical basis for LZ data compression and decompression systems that are in present day widespread usage. Two of their seminal papers appear in the IEEE Transactions on Information Theory, IT-23-3, May 1977, pp. 337-343 and in the IEEE Transactions on Information Theory, IT-24-5, September 1978, pp. 530-536. A ubiquitously used data compression and decompression system known as LZW is described in U.S. Pat. No. 4,558,302 by Welch, issued Dec. 10, 1985. LZW has been adopted as the compression and decompression standard used in the GIF image communication protocol and is utilized in the TIFF image communication protocol. GIF is a development of CompuServe Incorporated and the name GIF is a Service Mark thereof. A reference to the GIF specification is found in GRAPHICS INTERCHANGE FORMAT, Version 89a, 31 Jul., 1990. TIFF is a development of Aldus Corporation and the name TIFF is a Trademark thereof. Reference to the TIFF specification is found in TIFF, Revision 6.0, Finalxe2x80x94June 3, 1992.
LZW has also been adopted as the standard for V.42 bis modem compression and decompression. A reference to the V.42 bis standard is found in CCITT Recommendation V.42 bis, Data Compression Procedures For Data Circuit Terminating Equipment (DCE) Using Error Correction Procedures, Geneva 1990. The V.42 bis standard is further described in an article entitled xe2x80x9cV.42 bis: The New Modem Compression Standardxe2x80x9d by J. E. MacCrisken in the Spring 1991 issue of the Journal Of Data and Computer Communicationsxe2x80x94Modem Compression, pages 23-29.
Examples of LZ dictionary based compression and decompression systems are described in the following U.S. patents: U.S. Pat No. 4,464,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat No. 4,876,541 by Storer, issued Oct. 24, 1989; U.S. Pat No. 5,153,591 by Clark, issued Oct. 6, 1992; U.S. Pat No. 5,373,290 by Lempel et al., issued Dec. 13, 1994; U.S. Pat No. 5,838,264 by Cooper, issued Nov. 17, 1998; U.S. Pat No. 5,861,827 by Welch et al., issued Jan. 19, 1999; U.S. Pat No. 6,188,333 by Cooper, issued Feb. 13, 2001; and U.S. Pat No. 6,320,523 by York et al., issued Nov. 20, 2001.
In the above dictionary based LZ compression and decompression systems, the compressor and decompressor dictionaries may be initialized with all of the single character strings of the character alphabet. In some implementations, the single character strings are considered as recognized and matched although not explicitly stored. In such systems the value of the single character may be utilized as its code and the first available code utilized for multiple character strings would have a value greater than the single character values. In this way the decompressor can distinguish between a single character string and a multiple character string and recover the characters thereof. For example, in the ASCII environment the alphabet has an 8 bit character size supporting an alphabet of 256 characters. Thus, the characters have values of 0-255. The first available multiple character string code can, for example, be 258 where the codes 256 and 257 are utilized as control codes as is well known.
In the prior art dictionary based LZ compression systems, data character strings are stored and accessed in the compressor dictionary utilizing well known searchtree architectures and protocols. Typically, the searchtree is arranged in nodes where each node represents a character, and a string of characters is represented by a node-to-node path through the tree. When the input character stream has been matched in the dictionary tree up to a matched node, a next input character is fetched to determine if the string match will continue. Conventionally, a determination is made to ascertain if the fetched character is already stored as an extension node of the matched node. Various techniques are utilized to effect this determination such as associative memory dictionaries, hashing and sibling lists as are well understood in the art.
In the above dictionary based systems, numerous iterative operations and dictionary accesses are required at the compressor for compressing an input stream of data characters. Normally an iteration including several. dictionary accesses is required for each input data character and when utilizing an associative memory, it may be necessary to search the entire memory to determine if a string exists therein. It is desirable in such systems to minimize the number of iterative processes and dictionary accesses so as to enhance system performance.
Although the known dictionary architectures and protocols provide efficient data compression systems, it is a continuing objective in the art to improve compressor performance.
The data compressors of said Ser. No. 10/195,795; Ser. No. 10/271,196 and Ser. No. 10/351,210 provide an improvement over the prior art by replacing the known dictionary architecture by matrices of coincidence elements thereby eliminating dictionary accesses. Although, in said Ser. No. 10/195,795 and Ser. No. 10/351,210, dictionary accesses are eliminated, compressor iterations are utilized for processing sequentially fetched input characters. In said Ser. No. 10/271,196, although compressor iterations for processing sequentially fetched input characters are eliminated, the embodiments therein utilize a significant number of coincidence elements as do the embodiments of said Ser. No. 10/195,795.
The present invention replaces the conventional dictionary arrangements with digital logic elements and switches to provide a new architecture and protocols which, it is believed, will improve the performance of LZ type data compression systems. The embodiments of the present invention eliminate both dictionary accesses and compressor iterations for processing sequentially fetched input characters while utilizing significantly fewer coincidence elements than the embodiments of said Ser. No. 10/195,795 and Ser. No. 10/271,196. The embodiments of the present invention utilize a similar number of coincidence elements as the embodiments of said Ser. No. 10/351,210.
The present invention is embodied in a data compressor for compressing an input stream of data characters into an output stream of compressed codes. The compressor includes a plurality of coincidence elements corresponding to a respective plurality of codes to be assigned to strings. A string is comprised of a prefix string of at least one of the data characters followed by an extension character, a prefix string having a prefix code associated therewith. A coincidence element provides a coincidence output and has a prefix code input and a character input for enabling the coincidence element to energize the coincidence output thereof upon coincidental energization of the inputs so that energization of a coincidence output of a coincidence element provides a representation of the code corresponding thereto. The compressor further includes a first coupling arrangement for selectively coupling the provided representations of codes corresponding to the coincidence elements to the prefix code inputs of the coincidence elements and a second coupling arrangement for selectively coupling representations of data characters fetched from the input stream to the character inputs of the coincidence elements. A plurality of data characters fetched from the input stream is applied to the second coupling arrangement so as to enable a coincidence element corresponding to a code assigned to a string that is the longest match to the fetched plurality of data characters. The code of the longest matching string is output, thereby providing the stream of compressed codes.
In the preferred embodiments, an extended string comprising the prefix string having the code corresponding to the longest matching string and the extension character corresponding to the data character following the longest matching string is inserted into the compressor and assigned the next available code. The extended string is stored and the code assigned by coupling the representation of the code assigned to the longest matching string and the representation of the fetched data character following the longest matching string to the prefix code input and the character input, respectively, of the coincidence element corresponding to the next code to be assigned to a string.