1. Field of the Invention
The invention relates to LZ data compression systems particularly with respect to the LZW compression methodology. More particularly, the invention relates to the architecture and protocols for storing and accessing data character strings in the compressor.
2. Description of the Prior Art
Professors Abraham Lempel and Jacob Ziv provided the theoretical basis for LZ data compression and decompression systems that are in present day widespread usage. Two of their seminal papers appear in the IEEE Transactions on Information Theory, IT-23-3, May 1977, pp. 337-343 and in the IEEE Transactions on Information Theory, IT-24-5, September 1978, pp. 530-536. A ubiquitously used data compression and decompression system known as LZW, adopted as the standard for V.42 bis modem compression and decompression, is described in U.S. Pat. No. 4,558,302 by Welch, issued Dec. 10, 1985. LZW has been adopted as the compression and decompression standard used in the GIF image communication protocol and is utilized in the TIFF image communication protocol. GIF is a development of CompuServe Incorporated and the name GIF is a Service Mark thereof. A reference to the GIF specification is found in GRAPHICS INTERCHANGE FORMAT, Version 89a, Jul. 31, 1990. TIFF is a development of Aldus Corporation and the name TIFF is a Trademark thereof. Reference to the TIFF specification is found in TIFF, Revision 6.0, Finalxe2x80x94Jun. 3, 1992.
Further examples of LZ dictionary based compression and decompression systems are described in the following U.S. patent Nos.: U.S. Pat. No. 4,464,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat. No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat. No. 4,876,541 by Storer, issued Oct. 24, 1989; U.S. Pat. No. 5,153,591 by Clark, issued Oct. 6, 1992; U.S. Pat. No. 5,373,290 by Lempel et al., issued Dec. 13, 1994; U.S. Pat. No. 5,838,264 by Cooper, issued Nov. 17, 1998; and U.S. Pat. No. 5,861,827 by Welch et al., issued Jan. 19, 1999.
In the above dictionary based LZ compression and decompression systems, the compressor and decompressor dictionaries may be initialized with all of the single character strings of the character alphabet. In some implementations, the single character strings are considered as recognized and matched although not explicitly stored. In such systems the value of the single character may be utilized as its code and the first available code utilized for multiple character strings would have a value greater than the single character values. In this way the decompressor can distinguish between a single character string and a multiple character string and recover the characters thereof. For example, in the ASCII environment, the alphabet has an 8 bit character size supporting an alphabet of 256 characters. Thus, the characters have values of 0-255. The first available multiple character string code can, for example, be 258 where the codes 256 and 257 are utilized as control codes as is well known.
In the prior art dictionary based LZ compression systems, data character strings are stored and accessed in the compressor dictionary utilizing well known searchtree architectures and protocols. Typically, the searchtree is arranged in nodes where each node represents a character, and a string of characters is represented by a node-to-node path through the tree. When the input character stream has been matched in the dictionary tree up to a matched node, a next input character is fetched to determine if the string match will continue. Conventionally, a determination is made to ascertain if the fetched character is already stored as an extension node of the matched node. Various techniques are utilized to effect this determination such as hashing and sibling lists as are well understood in the art.
Although the known dictionary architecture and protocols provide efficient data compression systems, it is a continuing objective in the art to improve compressor performance.
Said Ser. No. 09/855,127 discloses a character table implementation for LZ dictionary type compressors that provides improvements as described therein. In the embodiments of said Ser. No. 09/855,127 one or more character tables may include a significant number of table entries thereby tending to impede compressor speed by increasing table search time.
The present invention provides a new string storage and access architecture and protocols which, it is believed, will improve the performance of LZ type data compression algorithms.
In the present invention a plurality of prefix tables corresponding to the respective prefix codes are utilized instead of the conventional searchtree structured dictionary. A string is stored by storing the code associated with the string in the prefix table corresponding to the code of the string prefix at a prefix table location corresponding to the extension character of the string. The input data character stream is compared to the stored strings by determining if the table location is empty corresponding to the currently fetched character in the prefix table associated with the code of the currently matched string. If the location is not empty it is storing the code of the string comprising the currently matched string extended by the currently fetched character. This string code is utilized as the next match with which to continue the search with the next fetched character. If, however, the location is empty, the longest match has been determined to be the currently matched string, and the code thereof is output. The stored strings are updated by storing the next available string code in the empty location. The current character utilized to access the location in the prefix table in which the empty location was encountered is the mismatching character that caused the string matching process to terminate at the longest match. In an LZW embodiment, the mismatching character is utilized to begin the next string search by using this character as the initial current match for the new string.
An alternative embodiment of the invention includes creating the prefix tables when the strings corresponding to the associated prefix codes are first matched in the input.
A still further embodiment involves storing the extension character of an update extended string together with the code of the string at a prefix table location and creating the table locations as update extended strings are encountered.
The present invention provides a new implementation architecture for data compression algorithms, such as LZW, that is believed will result in significant advantages over the prior art such as enhanced speed and performance. Additionally, it is believed that the prefix tables of the herein described embodiments will be shorter than, and thus have fewer entries than, the character tables of said Ser. No. 09/855,127 thereby providing faster table look-up and string searches.