Many computing systems carry out lossless data compression to exploit the abundant data compressibility for reducing data storage and transmission overhead. In general, to maximize the compression ratio, lossless data compression includes two steps: (1) a dictionary-type compression scheme, such as various known LZ methods, is first used to compress the original data; and (2) then an entropy coding such as Huffman coding is applied to the output of the first step to further reduce the data volume. In addition, lossless compression is applied either on the file/object level or on the data chunk level. When applying compression on the file/object level, each individual file/object is compressed as a unit and accessing any portion in a compressed file/object demands the decompression from the very beginning of the compressed file/object. When applying compression on the data chunk level, each individual file/object is partitioned into consecutive fix-sized chunks (e.g., each chunk can be 64 kB or 256 kB), and each chunk is compressed (hence can be decompressed) independently from other chunks. Therefore, in the case of chunk-level compression, accessing any portion in a compressed file/object demands the decompression of one compressed chunk.
In current practice, data compression is carried out solely by computing chips such as CPUs and/or GPUs in a host. In many practical systems, particularly very performance-demanding and/or data-intensive systems, implementing both steps of compression (i.e., first dictionary-type compression and then entropy coding) on host processing chips is not a viable option, due to the overall compression implementation overhead in terms of CPU/GPU cycles, cache and DRAM resource, and data transfer. As a result, many systems carry out data compression by eliminating the second-step entropy coding in order to reduce implementation complexity and achieve high-speed compression/decompression.
For example, the high-speed Snappy compression library, which is widely used in many systems such as Google's BigTable and Cassandra, and Hadoop, only implements the first-step dictionary-type compression. Although they can achieve very high compression/decompression speed, such entropy-coding-less compression schemes have worse compression ratio, hence the systems cannot fully exploit the abundant run-time data compressibility to facilitate the data storage and data transfer.