Contemporary data processing activities often produce, manipulate, or consume large quantities of data. Storing and transferring this data can be a challenging undertaking. One approach that is frequently productive is to compress the data so that it consumes less space (and can be transmitted over a given communication channel more quickly). Data compression algorithms identify redundant or inefficiently-coded information in an input data stream and re-encode it to be smaller (i.e., to be represented by fewer bits). Various types of input data may have different characteristics, so that a compression algorithm that works well for one type of data may not achieve a comparable compression ratio when processing another type of data.
No known compression algorithm achieves the best results for every data type; there is always an input data stream that an algorithm simply cannot make any smaller, though there is often a different algorithm that could re-encode the same data stream in a smaller number of bits. Sometimes, an algorithm operates in a way that both compresses a data stream and exposes additional redundancy or inefficient coding, so that a second compression stage could shrink the information even further. The design of an effective, general-purpose data compressor often involves trade-offs between the compression ratio and the number of stages (more stages typically increase compression and decompression processing time).
Many compression algorithms produce integers as part of a compressed data stream. For example, in run-length encoding (“RLE”) compression, repeated instances of a symbol may be replaced by a single copy of the symbol and an integer indicating how many times the symbol was repeated. From a theoretical perspective, the performance of an RLE compressor should be proportional to the lengths of repeated strings of symbols in the input. However, in a practical implementation on a digital computer, real-world considerations degrade the performance. For example, if lengths are represented as an eight-bit integer, then repeated sequences longer than 255 symbols must be broken into two or more runs of 255 or fewer symbols. The length limit can be increased by using 16-bit integers, but this “solution” has its own problems: now, runs shorter than 256 symbols waste eight bits of the “length” integer. (Furthermore, 16-bit integers have an upper limit at 65,535, so runs longer than that must also be broken up.)
Techniques for improving the handling of integers produced during a data compression process can increase the performance of that process.