Data compression techniques are generally used to reduce the amount of data to be stored or transmitted in order to reduce the storage capacity and transmission time, respectively. In either case, it is necessary to provide a corresponding decompression technique to enable the original data to be reconstructed.
The LZ data compression techniques and its variants are among the most popular data compression techniques. The LZ techniques are generally known as dictionary based techniques. In these techniques, a running dictionary is generated during both compression and decompression. One of the variants of the LZ techniques, the LZ1 algorithm works by using a fixed size “history buffer” window into the previously appeared input data stream. The longest match for the current character sequence is sought in the dictionary and a codeword is generated for the same in the output stream when a match is found. If a match is not found, then the non matchable current character sequences are stored as it is without a codeword.
The coded output stream includes codewords interspersed with non-matchable sequences of characters from the input character sequence. The codewords reference the sequence of characters which have previously appeared when decompressing the output code stream to allow the original input character sequence to be rebuilt from the code stream.
Typically, these codewords have three sub codes: length of a matched character sequence, a reference offset from the dictionary, and an offset to the next codeword. The offset in a codeword often consumes the most number of bits. The longer the offset found, the greater the number of bits needed to encode the codeword. Therefore, there is a need to reduce the offset so that the number of bits required to encode a codeword is reduced and hence achieving a higher data compression.