1. Field of the Invention
This invention relates in general to data compression, and in particular, to a method for compressing and decompressing data with an alphabet.
2. Description of Related Art
The Liv-Zempel 77 (LZ77) method is a well known method of data compression and decompression. However, it is inefficient in terms of its code space usage. This can be illustrated by an encoding and decoding example using the prior art LZ77 algorithm.
The following terms are used in describing the prior art LZ77 method:
Input Stream: a sequence of characters to be compressed; PA1 Character: a basic data element in the input stream; PA1 Coding Position: a position of the character in the input stream that is currently being coded (the beginning of a lookahead buffer defined below); PA1 Lookahead Buffer: a character sequence from the coding position to an end of the input stream; PA1 Window: a "backward" window of size W that contains W characters from the coding position, i.e., the last W characters previously processed; PA1 Pointer: a pointer to a match in the window W that also specifies the length of the match. PA1 (i) Set the coding position to the beginning of the input stream. PA1 (ii) Find a match in the backward window W for the lookahead buffer. PA1 (iii) output the triple (B,L)C with the following meanings: PA1 (iv) If the lookahead buffer is not empty, then move the coding position (and the backward window W) L+1 characters forward and return to step (ii); otherwise, terminate. PA1 The column Step indicates the number of the encoding step. It completes each time the prior art LZ77 encoding method makes an output. With the prior art LZ77 method, this happens in each step of the encoding method above at (iii). PA1 The column Pos indicates the coding position. The first character in the input stream has the coding position 1. PA1 The column W shows the backward window. PA1 The column Match shows the longest match found in the window. PA1 The column Char shows the first character in the lookahead buffer after the match. PA1 The column Output presents the output in the format (B,L)C. (B,L) is the pointer to the Match, which provides the following instruction to the decoding method: "Go back B characters in the window and copy L characters to the output." C is the next character.
With regard to encoding, the prior art LZ77 method searches the window for the longest match with the beginning of the lookahead buffer and outputs a pointer to that match. Since it is possible that not even a one-character match can be found, the output cannot contain just pointers. The prior art LZ77 method solves this problem as follows: after each pointer, it outputs the first character in the lookahead buffer after the match; if there is no match, then it outputs a null-pointer and the character at the coding position. Then, the coding position is moved further by one.
Specifically, the steps of the prior art LZ77 encoding method comprise the following:
(1) B is the number of characters to be traversed backwards in the backward window W in order to get to the starting location of the match. If there is no match, then B takes a null value (0) without loss of generality. PA2 (2) L is the number of characters matched. PA2 (3) C is the first character in the lookahead buffer that did not match.
This is best illustrated by providing an example of the prior art LZ77 encoding method. The following table describes the input data for the example, wherein the first row indicates the position and the second row indicates the corresponding character:
Pos 1 2 3 4 5 6 7 8 9 Char A A B C B B A B C
The following table illustrates the prior art LZ77 encoding method performed on the above input data:
 Step Pos W Match Char Output 1. 1 -- -- A (0,0) A 2. 2 A A B (1,1) B 3. 4 AAB -- C (0,0) C 4. 5 AABC B B (2,1) B 5. 7 AABCBB AB C (5,2) C
The following describes the columns in the above table:
With regard to the prior art LZ77 decoding method, the window is maintained the same way as during the encoding method. In each step, the decoding method reads a triple (B,L)C from the input. The decoding method outputs the sequence from the window specified by (B,L) and the character C.
The compression ratio achieved by the prior art LZ77 method is very good for many types of data, but the encoding method can be quite time-consuming, since there are a lot of comparisons to perform between the lookahead buffer and the window. On the other hand, the decoding method is very simple and fast. Memory requirements are low both for the encoding and the decoding methods, since the only structure held in memory is the window, which is usually sized between 4 and 1 kilobyte.
However, the prior art LZ77 method suffers from the problem of non-optimal code space usage, because it uses two integers and one character for a code. The first integer is the starting position of the match, the second integer is the length of the match, and the character is the first non-matching character after the match. In practical terms, including the first non-matching character after the match leads to compression inefficiency.
Other prior art methods exist to code this character selectively, based on an efficiency criteria. However, each requires that the decoding method check whether it is to decode a character of a string from the window. In logic or instruction terms, the check requires a conditional branch, once for every compressed code, resulting in inefficient logic. For systems that are read intensive (such as database management systems where reads outnumber writes by 3-to-1 or more), it is necessary to speed up the decoding method, and removing conditional branches from the decoding method is one means of doing so. Thus, there is a need in the art for an improved LZ77 method that not only optimizes code space usage, but also the speed of decoding.