Data compression is a technique that can be used when either storing or transmitting a block of data, to reduce redundancy or amount of data. By compressing a block of data its effective size can be reduced without reducing the amount of information that is carried by the particular data block. Data compression increases the density of information that is to be stored or communicated by reducing the amount of memory needed to store the block of data or the transmission time necessary to transmit such a block of data. Generally, three characteristics are used to evaluate data compressors: how efficient the compressor is, how fast the compressor is, and whether the compressor can fully reproduce the block of data without introducing any error.
The efficiency of a data compressor is measured in a quantity called a compression ratio, which is calculated by dividing the number of uncompressed characters by the number of compressed characters. The higher the compression ratio the greater the density of the compressed data. A compression ratio of 2 means that the number of characters after compression is half of the number of characters before compression.
There are numerous techniques used to compress data. One method is adaptive compression or as it is sometimes referred to, dictionary based compression. Adaptive compression begins with an empty table of symbol strings and builds the table as the data is compressed so that the contents of the string table will reflect the characteristics of the particular data block. Using this method, a compression ratio above 1 can be achieved if the number of bits required to represent a symbol string is less than the average length of repeated symbol strings. This type of adaptive compression scheme was introduced by Jacob Ziv and Abraham Lempel in an article entitled “Compression of Individual Sequences via Variable Rate Coding”, IEEE Transactions on Information Theory, Vol. 24, No. 5, pages 530-536 (September 1978). This method constructs a table or dictionary of symbol strings from the data as it is input to the compressor. Then the next time that a specific string is encountered its corresponding dictionary index will be transmitted instead of the symbol string. This compression scheme is referred to as LZ78 and it requires only one pass over the data in order to perform compression.
In 1984 Terry Welch proposed a variation on the LZ78 procedure in “A Technique For High-Performance Data Compression”, IEEE Computer, Vol. 17, No. 6, pages 8-19 (June 1984). This data compression scheme is referred to as the LZW algorithm and also requires only one pass over the data. It is organized around a table, made up of strings of characters, where each string is unique. Each string is referenced by a fixed length code, which represents the longest matching string seen thus far in the previous input plus the one byte that makes this string different from prior strings. Each string is stored in the table at the next available address as determined at the time the string is input.
As the data is input into the compressor, the compressor parses the symbols into strings where as stated above, each string includes the longest matching string seen thus far in the previous input plus the one symbol that makes it different from prior strings. These strings are then added to the table and coded as wK, where w is the index of the previous string, or prefix, and K is the one symbol that makes this string different from prior strings. K is called the extension character of the prefix string w and is represented by its normal binary representation. For every string that is stored in the table its prefix, w, is also stored in the table. The prefix, w, is represented by the binary representation of its address within the table. The number of bits used to represent w will depend on the size of the table to be used.
A Lempel-Ziv-Welch compression algorithm works as follows:
1. Create a table—LZWTable with 2 columns, (string, code).
2. Populate the table with (ASCII characters, ASCII values) using all 256 ASCII characters.
3. instantiate an empty String: string1=“ ”.
4. While there are more characters to read from the input stream,
A. char1=get next character from the input stream
B. if string1+char1 exists in LZWTable then                                    i. string1=string1+char1                        
A. else                i. code1=get the code of string1 from LZWTable        ii. Write code1 to the output stream        iii. lastCode=Max code in LZWTable        iv. Add (string1+char1, lastCode+1) to the LZWTable        V. string1=char1        
A. End if
1. End Loop
2. code1=get the code of string1 from LZWTable
3. Write code1 to the output stream
A Lempel-Ziv-Welch decompression algorithm works as follows:
1. Create a table—LZWTable with 2 columns, (string, code).
2. Populate the table with (ASCII characters, ASCII values) using all 256 ASCII characters.
3. string1 Read oldCode from the input stream and find its translation from LZWTable.
4. output string1 to the output stream.
5. char1=string1.
6. While there are more codes to read from the input stream,
A. Read newCode from the input stream.
B. If newCode is not present in LZWTable then                i. string1=get translation for oldCode from LZWTable        ii. string1=string1+char1        
A. else                i. string1 get translation for newCode from LZWTable        
A. End if.
B. Write string1 to the output stream.
C. char1=1st character of string1.
D. string1=get translation for oldCode from LZWTable.
E. lastCode=Max code in LZWTable.
F. Add (string1+char1, lastCode+1) to the LZWTable.
G. oldCode=newCode.
1. End Loop.
A review of the described prior art LZW compression and decompression algorithms enables visualization of the input stream as a sequence of strings, for example:
<string (1)><string (2)><string (3)> . . . <string (n)>
and visualization of an output stream as a sequence of code, for example:
<code (1)><code (2)><code (3)> . . . <code (n)>.
Each of the strings are identified and read using a predefined method. Then, a code for the string is read from LZW Table and written to the output stream.
Initially the LZW Table is filled up with 256 codes, 0 to 255, where each of the codes are mapped to their respective ASCII characters, and new codes are mapped to new strings in the following manner.
256=>string (1)+(first character of string (2))
257=>string (2)+(first character of string (3))
. . .
. . .
255+x=>string (x)+(first character of string (x+1))
. . . ;
where string (x) is the xth string read from the input stream.
Note 1: Since characters are read one by one from the input stream, an entry for a code (255+x) is made in LZW Table only after reading string (x) and the 1st character of string (x+1). Remaining characters of string (x+1) are read after entering code (255+x) in the LZW Table.
The input stream may be visualized as a sequence of codes whose values, e.g., 255+x=>string (x), are read from the LZW Table and then written to the output stream. The output stream may be visualized, therefore, as a sequence of strings:
Input Stream—<code (1)><code (2)><code (3)> . . . <code (n)>
Output stream—<string (1)><string (2)><string (3)> . . . <string (n)>
Like in the compression method, initially the LZW Table is filled up with 256 codes and then any code 255+x is mapped to string (x)+(1st character of string (x+1)) and entered to the LZW Table where string (x) is the value of code (x) obtained from the LZW Table and code (x) is the xth code read from the input stream.
Note 2: Since codes are read one by one from the input stream, an entry of code 255+x is done only after completely reading code (x) & code (x+1) and then reading LZW Table to obtain string (x) and string (x+1) respectively as the values of the codes. That means unlike the compression method, string (x) & string (x+1) are completely obtained before making an entry for the code 255+x in the LZW Table.
Exception Handling
In the typical LZW algorithms, there may occur a condition in which the string value of a code might not be found in LZW Table at the time of decompression. For example, when at the time of compression, string (x) and 1st character of string (x+1) is read,
255+x=>string (x)+(1st character of string (x+1))
is added to the LZW Table. After reading the complete string (x+1),
string (x+1)==string (x)+(1st character of string (x+1)).
Since the code (255+x) was (just) entered in LZW Table for string (x+1), (255+x) is written to the output stream. This means that code (x+1) (255+x). The known decompression algorithm, however, will not find a translation for code (x+1). This is so because even while at the time of compression, (255+x) can be entered into LZW Table before reading string (x+1), it is not possible at the time of decompression according to Notes 1 and 2, explained above.
The prior art LZW data compression/decompression algorithms have developed exception handing routines to accommodate this exception. That is, known prior art methods add the following into LZW Table:
255+x=>string (x)+(1st character of string (x)),
and then write the string value of (255+x) in the output stream. Such exception handling accommodates the above exception condition because:
1st character of string (x+1)==1st character of string (x),
since string (x+1)==string (x)+(1st character of string (x+1)), as obtained at the time of compression.