This invention relates to data compression, and more particularly, to a system and a method for compressing a data sequence to produce an output codestream that allows partial decompressing to obtain an intermediate segment of the data sequence.
Data compression techniques are used to reduce the amount of data to be stored or transmitted in order to reduce the storage capacity and transmission time respectively. In either case it is necessary to provide a corresponding decompression technique to enable the original data to be reconstructed.
Many data compression and decompression techniques are known, with the Lempel-Ziv (LZ) technique and its variants proving to be very popular. U.S. Pat. No. 4,558,302, Welch, entitled xe2x80x9cHigh Speed Data Compression and Decompression Apparatus and Method;xe2x80x9d U.S. Pat. No. 4,701,745, Waterworth, entitled xe2x80x9cData Compression System;xe2x80x9d and U.S. Pat. No. 4,814,746, Miller et al., entitled xe2x80x9cData Compression Methodxe2x80x9d are patents that disclose some of these LZ techniques. One of the LZ variants is known as the LZ Opperhumer (LZO) technique. FIG. 1 shows an output codestream obtained by compressing an input character sequence using the LZO technique. The output codestream includes codewords interspersed with non-matchable sequences of characters from the input character sequence. The codewords reference sequences of characters which have previously appeared when decompressing the output codestream to allow the original input character sequence to be rebuilt from the codestream.
The LZ techniques are known as a dictionary-based technique. In this technique, a running dictionary is generated during both compression and decompression. To obtain an intermediate segment of an input character sequence from an output codestream, it is necessary to begin decompression from the start of the output codestream until the desired intermediate segment is obtained. Partial decompression of the output codestream is not possible with the prior art LZ techniques.
According to an aspect of the present invention, there is provided a method for compressing an input sequence of data portions to produce an output codestream and for partially decompressing the output codestream to obtain a selected segment of the input sequence. The method includes compressing the input sequence to produce the output codestream of non-matchable sequences and codewords. The codewords include a first codeword and subsequent codewords. Each of the codewords includes at least a length of a non-matchable sequence preceding a matchable first sequence. Each of the subsequent codewords further includes a first offset for indicating a start of the matchable first sequence in the preceding non-matchable sequence, a length of the matchable first sequence and a second offset for indicating a location of a preceding codeword in the output codestream.
According to another aspect of the present invention, there is provided a compressing/decompressing system having means for compressing an input sequence of data portions as described above.
According to another aspect of the present invention, there is provided a program storage device readable by a computing device. The program storage device tangibly embodies a program of instructions that is executable by the computing device to perform the above method for compressing/decompressing an input sequence of data portions.