The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for high bandwidth compression to encoded data streams.
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. LZ77 algorithms achieve compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the input (uncompressed) data stream. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement “each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream.” The “distance” is sometimes called the “offset” instead.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol.
Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-reference is inserted, linking to the previous location of that identical string instead. An encoded match to an earlier string consists of a length (3-258 bytes) and a distance (1-32,768 bytes). Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32 kB of uncompressed data decoded (termed the sliding window).
The second compression stage consists of replacing commonly used symbols with shorter representations and less commonly used symbols with longer representations. Huffman coding creates an un-prefixed tree of non-overlapping intervals, where the length of each sequence is inversely proportional to the probability of that symbol needing to be encoded. The more likely a symbol has to be encoded, the shorter its bit-sequence will be.