1. Field of the Invention
The invention relates to LZ data compression systems particularly with respect to the LZW compression methodology. More particularly, the invention relates to a novel string deletion process for recovering string codes in a prefix table string storage arrangement such as that described in said Ser. No. 10/101,046.
2. Description of the Prior Art
Professors Abraham Lempel and Jacob Ziv provided the theoretical basis for LZ data compression and decompression systems that are in present day widespread usage. Two of their seminal papers appear in the IEEE Transactions on Information Theory, IT-23-3, May 1977, pp. 337-343 and in the IEEE Transactions on Information Theory, IT-24-5, September 1978, pp. 530-536. A ubiquitously used data compression and decompression system known as LZW is described in U.S. Pat. 4,558,302 by Welch, issued Dec. 10, 1985. LZW has been adopted as the compression and decompression standard used in the GIF image communication protocol and is utilized in the TIFF image communication protocol. GIF is a development of CompuServe Incorporated and the name GIF is a Service Mark thereof. A reference to the GIF specification is found in GRAPHICS INTERCHANGE FORMAT, Version 89a, Jul. 31, 1990. TIFF is a development of Aldus Corporation and the name TIFF is a Trademark thereof. Reference to the TIFF specification is found in TIFF, Revision 6.0, Finalxe2x80x94Jun. 3, 1992.
LZW has also been adopted as the standard for V.42 bis modem compression and decompression. A reference to the V.42 bis standard is found in CCITT Recommendation V.42 bis, Data Compression Procedures For Data Circuit Terminating Equipment (DCE) Using Error Correction Procedures, Geneva 1990. The V.42 bis standard is further described in an article entitled xe2x80x9cV.42 bis: The New Modem Compression Standardxe2x80x9d by J. E. MacCrisken in the Spring 1991 issue of the Journal Of Data and Computer Communicationsxe2x80x94Modem Compression, pages 23-29.
Examples of LZ dictionary based compression and decompression systems are described in the following U.S. patents: U.S. Pat. No. 4,464,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat. No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat. No. 4,876,541 by Storer, issued Oct. 24, 1989; U.S. Pat. No. 5,153,591 by Clark, issued Oct. 6, 1992; U.S. Pat. No. 5,373,290 by Lempel et al., issued Dec. 13, 1994; U.S. Pat. No. 5,838,264 by Cooper, issued Nov. 17, 1998; and U.S. Pat. No. 5,861,827 by Welch et al., issued Jan. 19, 1999.
In the above dictionary based LZ compression and decompression systems, the compressor and decompressor dictionaries may be initialized with all of the single character strings of the character alphabet. In some implementations, the single character strings are considered as recognized and matched although not explicitly stored. In such systems the value of the single character may be utilized as its code and the first available code utilized for multiple character strings would have a value greater than the single character values. In this way the decompressor can distinguish between a single character string and a multiple character string and recover the characters thereof. For example, in the ASCII environment, the alphabet has an 8 bit character size supporting an alphabet of 256 characters. Thus, the characters have values of 0-255. The first available multiple character string code can, for example, be 258 where the codes 256 and 257 are utilized as control codes as is well known.
In the prior art dictionary based LZ compression systems, data character strings are deleted utilizing procedures such as those described in said U.S. Pat. No. 4,814,746; 4,876,541; 5,153,591 as well as in said CCITT V.42 bis standard. The implementations of the prior art string deletion algorithms involve varying degrees of complexity. The prefix table string storage architecture of said Ser. No. 10/101,046 is particularly suited for including a relatively uncomplicated string deletion algorithm. A string deletion algorithm for use with the prefix table architecture of said Ser. No. 10/101,046 does not yet exist in the prior art.
The present invention provides a novel string deletion algorithm particularly adapted to the prefix table string storage architecture of said Ser. No. 10/101,046.
In the embodiments of the present invention a plurality of prefix tables corresponding to the respective plurality of prefix codes are utilized. A string is stored in the prefix tables by storing the code associated with the string in the prefix table corresponding to the code of the string prefix at a prefix table location corresponding to the extension character of the string. The input data character stream is searched by comparing the input stream to the stored strings to determine the longest match therewith. The code associated with the longest match is outputted so as to provide the output stream of compressed codes. The stored strings are updated by inserting an extended string into the prefix tables, the extended string comprising the longest match extended by the next data character in the input stream following the longest match, the extended string being stored in the prefix table corresponding to the code of the longest match, a code being assigned to the extended string. A code is deleted from a prefix table for reassignment to an extended string to be inserted when further codes are unavailable for assignment.
A particular code is selected for reassignment by determining that the prefix table corresponding to the particular code indicates that the string represented by the particular code has not been extended. Specifically, the particular code is selected by determining that the prefix table corresponding to the code is empty or has not been established.
Alternative embodiments of the invention include creating the prefix tables when the strings corresponding to the associated prefix codes are first matched in the input or creating the table locations as update extended strings are encountered and storing the extension character of the update extended string together with the code of the string at the created table location.