This invention relates to data decompression, and particularly to xe2x80x98LZ1xe2x80x99 data decompression.
The Lempel-Ziv algorithms are well known in the field of data compression. In particular, the xe2x80x9chistory bufferxe2x80x9d version, known as LZ1, has become particularly popular in hardware implementations wherever lossless compression of coded data is required, since its relatively modest buffer requirements and predictable performance make it a good fit for most technologies.
The LZ1 algorithm works by examining the input string of characters and keeping a record of the characters it has seen. Then, when a string appears that has occurred before in recent history, it is replaced in the output string by a xe2x80x9ctokenxe2x80x9d: a code indicating where in the past the string has occurred and for how long. Both the compressor and decompressor must use a xe2x80x9chistory bufferxe2x80x9d of a defined length, but otherwise no more information need be passed between them.
Characters that have not been seen before in a worthwhile string are coded during compression as a xe2x80x9cliteralxe2x80x9d. This amounts to an expansion of the number of bits required, but in most types of data the opportunities for token substitution (and hence compression) outweigh the incompressible data, so overall compression is achieved. Typical compression ratios range from 2:1 to around 10:1.
Some variations of the basic LZ1 algorithm have emerged over the years, but improvements have been incremental.
As the LZ1 algorithm works on units of a byte, traditional hardware implementations consider just one byte at a time when decompressing the input stream.
LZ1 compressed data consists of a stream of variable length tokens, each of which must be decoded in turn, to produce the decompressed data. A token in the compressed data stream may either be a literal which contains one byte of data, or else a copy pointer which specifies a string of bytes, which can be obtained by reference to the most recent bytes which have already been decompressed and are held for this purpose in the history buffer.
With traditional one byte hardware decompression, only one input token needs to be considered in each cycle. A literal is fully decoded in one cycle, while a copy pointer takes several cycles to decode. Exactly one byte of decompressed data is produced each cycle.
To increase the performance of hardware decompression there are two alternatives: to reduce the cycle time or to produce more decompressed bytes per cycle. Although the easier of the two options, work to reduce the cycle time typically only yields a small improvement in performance. Any attempted multi-byte approach will lead to a significantly more complex design, but has the advantage that it is able to provide a performance which is many times that of the single byte case. The only previous attempted multi-byte hardware LZ1 implementation, that of U.S. Pat. No. 5,771,011 and its divisional U.S. Pat. No. 5,929,791, was inherently limited to two bytes per cycle.
The problems associated with multi-byte hardware decompression arise from the following characteristics of the compressed input data stream:
1. An input token produces a variable number of bytes from 1 to 271.
2. The input tokens are variable length (9,12,14,16,18 or 22 bits).
It is desirable to make multibyte decompression possible by pipelining the operation into several stages. However because of the nature of the data the following problems arise:
1. In order to produce, for example, 12 bytes of output, it is not easily known how many input tokens are required. For example the 12 bytes could come from 12 literal tokens, or from part of one long token containing more than 12 bytes. Without knowing how many tokens have been processed in one cycle it is not possible to start the next cycle.
2. Because the tokens are variable length, it is impossible to extract the next 12 tokens from the input data in one cycle. On first inspection it appears necessary to do this in order to know how much data has been used from the input stream so that the next cycle can process the input stream at the correct position.
3. When decoding a group of 12 bytes, one of the later bytes may be produced from a copy pointer token which points to one or more of the earlier bytes, whose value may not yet be known if it is in the process of being decoded from a copy pointer token.
In order to increase the performance of hardware decompression of LZ1 compressed data, it is desirable to process multiple bytes per cycle, while overcoming the above problems.
The present invention accordingly provides, in a first aspect, a method for decompressing three or more bytes per processor cycle from a stream of compressed data using a processing pipeline, wherein said compressed data is represented by tokens of varying and unknown length, the method comprising the steps of: accepting as input said stream, comprising token data; partially decoding a token from said token data to determine a boundary position of said token; and priming said processing pipeline with said token and a length marker indicating said boundary position.
The method of the first aspect preferably further comprises the steps of: determining if any token in said processing pipeline represents literal data and if any token in said processing pipeline represents a copy pointer; responsive to said step of determining, marking each token with a marker, said marker indicating which of literal data and copy pointer is represented; and passing literal data from any of said tokens marked as representing literal data directly to an output means.
The method of the first aspect preferably further comprises the steps of: determining if any of said tokens marked as representing a copy pointer has a copy pointer pointing into said tokens in said processing pipeline at a pointed-to token; and responsive to a determination that a copy pointer is pointing into said tokens in said processing pipeline at a pointed-to token, replacing said copy pointer with the pointed-to token.
The method of the first aspect preferably further comprises the steps of: replacing said pointed-to token with its pointed-to data; and passing said pointed-to data to an output means.
In a second aspect, the present invention provides an arrangement for decompressing three or more bytes per processor cycle from a stream of compressed data using a processing pipeline, wherein said compressed data is represented by tokens of varying and unknown length, the arrangement comprising: first logic means for accepting as input said stream, comprising token data; second logic means for partially decoding a token from said token data to determine a boundary position of said token; and third logic means for priming said processing pipeline with said token and a length marker indicating said boundary position.
The arrangement of the second aspect preferably further comprises: fourth logic means for determining if any token in said processing pipeline represents literal data and if any token in said processing pipeline represents a copy pointer; fifth logic means, responsive to said fourth logic means, for marking each token with a marker, said marker indicating which of literal data and copy pointer is represented; and sixth logic means for passing literal data from any of said tokens marked as representing literal data directly to an output means.
The arrangement of the second aspect preferably further comprises: seventh logic means for determining if any of said tokens marked as representing a copy pointer has a copy pointer pointing into said tokens in said processing pipeline at a pointed-to token; and eighth logic means, responsive to a determination that a copy pointer is pointing into said tokens in said processing pipeline at a pointed-to token, for replacing said copy pointer with the pointed-to token.
The arrangement of the second aspect preferably further comprises: ninth logic means for replacing said pointed-to token with its pointed-to data; and tenth logic means for passing said pointed-to data to an output means.
The arrangement of the second aspect is preferably an arrangement wherein any of said logic means comprises one or more processor components.
The arrangement of the second aspect is preferably an arrangement wherein said one or more processor components comprise one or more application specific integrated circuits.