1. Field of the Invention
This invention relates to data compression and decompression.
2. Description of the Prior Art
Lempel and Ziv generated a number of lossless data compression algorithms such as the so-called LZ77 and LZ78 algorithms, which in turn form the basis for many variants including LZW, LZSS and others. They are both dictionary coders which achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A matching section of data is encoded by a pair of numbers called a distance-length pair, effectively a counter (how many characters to copy) and a pointer (where to find the first such character in the already-encoded or already-decoded data). The distance-length pair can be seen as equivalent to the statement “each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream.”
During the encoding phase, the best possible match (of a string of characters to be encoded) is sought, which is to say that the system obtains the match within the available encoded data which gives the greatest length. This process will be illustrated schematically with reference to FIGS. 1a to 1c of the accompanying drawings.
Data which has already been encoded is stored in an area of memory called a “search buffer” 10. The search buffer may be big enough to hold all of the already-encoded data in a particular application, or may just hold a certain amount of most-recently-encoded data. Data to be encoded is stored in an area of memory called a “look-ahead buffer” 20. The first character in the look-ahead buffer 20 (in this case, the character “g”) is the next character to be encoded. It will of course be appreciated that alphanumeric characters are used in the present application to illustrate data to be encoded, and the data are referred to as “characters”, but this notation is just for clarity of the explanation. Naturally, it is not technically important whether the data represents alphanumeric characters, pixels, audio samples or whatever. Similarly, the size in bits of each such “character” is not technically important.
The encoder searches for instances of “g” in the search buffer 10. In the present example, two are found (FIG. 1a).
The encoder then tests the position following each instance of “g” in the search buffer to detect whether the next character in the look-ahead buffer (the character “f”) follows either instance of “g”. In fact it does in both instances (FIG. 1b), so the search proceeds to the next following character in the look-ahead buffer (“s”). This follows only one instance of “gf” (FIG. 1c), and the next-following character in the look-ahead buffer (“h”) does not follow those three characters in the search buffer. So, the distance-length pair generated in respect of the first three characters of the look-ahead buffer is (13,3), where the string starts 13 characters back in the encoded sequence (the search buffer) and has a length of 3 characters.
If no matching strings of two or more characters are found in the search buffer, the character is encoded as a “literal”, which is to say it is simply copied into the output data stream. So, the benefits of compression result from the use of distance-length pairs; the quoting of literals acts against the aims of data compression, because not only is a literal as large as the original character, there is also a need for some sort of a flag to indicate that it is a literal. Therefore, these techniques are well suited to data in which there is a good chance of repetition of character strings, in order to achieve a useful compression ratio.
In general, therefore, the compressed data generated by this type of process will be formed of distance-length pairs interspersed with literals.
FIGS. 2a to 2d schematically illustrate the decoding of a distance-length pair (13,3).
FIG. 2a illustrates a search buffer providing the decoded data available as the distance-length pair (13,3) is about to be decoded. In FIGS. 2b, 2c and 2d, a string of three characters starting from the position 13 characters back in the search buffer is copied across for output.
Data compression and decompression of this type are essentially linear processes, in that the successful encoding and decoding of the data relies on all previous data having been encoded or decoded.
An object of the invention is to seek to provide an improved data compression and/or decompression technique.