1. Field of the Invention
The present invention relates to a computer program product, system, method, and data structure for using variable encodings to compress an input data stream to a compressed output data stream.
2. Description of the Related Art
Data compression involves converting symbols, including data symbols and control symbols, in an input data stream into a compressed output data stream comprising less data than the input data stream. Control symbols are encoded into the compressed data stream and provide decoding instructions to allow decompression, and may be created by the application program which is compressing data. Examples of control symbols created by the application include an end of record control signal, a file mark, and a dataset boundary. There are other events or controls determined by the compression device, such as when to swap to a given compression scheme, and when to reset the history buffer used to perform compression in a given scheme. The compressing device may decide to create an access point at which compression begins with a reset history buffer using a particular scheme after a dataset boundary is encountered.
One type of encoding technique, Huffman coding, provides a lossless compression algorithm that uses a variable length code table for encoding source symbols where the variable length code table has been derived based on the estimated probability of occurrence for each possible source symbol value to produce encodings of the source symbols.
A streaming lossless data compression algorithm (SLDC) receives an input data stream of control symbols and data symbols and converts strings of consecutive bytes of data symbols into copy pointers and literal data symbols that have fewer bits than the data symbols in the input data stream. The SLDC algorithm is used to compress and decompress data in Linear Tape Open (LTO) magnetic tape cartridges. Details of the SLDC algorithm are described in the Standard ECMA-321 publication “Streaming Lossless Data Compression Algorithm—(SLDC), dated Jun. 1, 2001.
FIG. 1 illustrates a prior art implementation of a literal data symbol 2 that the SLDC algorithm outputs into the compressed data stream if there are no consecutive bytes following a data byte, so that the literal data byte is outputted, uncompressed. The literal data symbol 2 includes a zero bit 4 indicating that the symbol is a literal data symbol followed by the actual, uncompressed, data byte. FIG. 20 illustrates prior art operations to generate the literal for the literal symbol (at block 540) by outputting the unencoded data unit to include in the literal symbol (at block 542).
FIG. 2 illustrates a prior art implementation of a copy pointer symbol 10 that the SLDC algorithm outputs to represent multiple consecutive data bytes that match a same number of consecutive data bytes in the history buffer. The copy pointer symbol 10 includes a one bit 12 indicating that the symbol is a copy pointer, a match count field 14 indicating the number of matching consecutive bytes, and a displacement field 16 indicating an absolute memory address having the history buffer entry including the first byte of the matching consecutive bytes. FIG. 14 illustrates prior art operations to determine the displacement count in the copy pointer symbol by determining (at block 352) the absolute memory address having the entry in the history buffer with the start of the matching consecutive data bytes.
FIG. 3 shows a prior art implementation of a match count table 18 that is used to determine an encoding of a match count value as a number of bits, which indicates the number of matching consecutive bytes. The encoding of the match count value specified in the second column is included in the match count field 14 of the copy pointer symbol 10 being generated to represent the consecutive data bytes.
There is a need in the art to provide techniques to continue to improve the compression realized using compression algorithms, such as SLDC and others.