The present invention relates generally to computer systems, and more specifically, to system level testing of entropy encoding.
In signal processing, data compression involves reducing the size of a data file by encoding information so that it uses fewer bits than an original representation of the information. Compression is performed using either lossy compression or lossless compression. Lossless compression reduces bits by identifying and eliminating statistical redundancy, and no information is lost when lossless compression is performed. In contrast, lossy compression reduces bits by removing unnecessary, or less important, information, and these bits can be lost when lossy compression is performed. Data compression is useful because it reduces resources required to store and transmit data. Computational resources are consumed in the compression process and, usually, in the reversal of the compression process (expansion). The design of a data compression scheme involves trade-offs among various factors, such as the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources required to compress and expand the data.
Huffman and Lempel-Ziv are two types of lossless compression algorithms. Huffman encoding is a type of entropy encoding that creates and assigns a unique prefix-free code to each unique symbol that occurs in the input data. The term “entropy encoding” is used to refer to lossless data compression schemes that are independent of the specific characteristics of the medium storing the data. Huffman encoding is used to compress data by replacing each fixed-length input symbol in the input data with a corresponding prefix-free code. The prefix-free codes are of different lengths with the most common symbols in the input data being assigned the shortest codes. The basic idea in Huffman encoding is to assign short codewords to those input blocks with high probabilities of occurring and long codewords to those with low probabilities of occurring. The design of the Huffman code is optimal for a fixed block length assuming that the source statistics are known a priori.
Lempel-Ziv compression algorithms are used to implement variable-to-fixed length codes. The basic idea of Lempel-Ziv is to parse an input sequence of data into non-overlapping blocks of different lengths while constructing a dictionary of blocks seen thus far. In contrast to a Huffman code which relies on estimates about frequencies of symbols in the input data, a Lempel-Ziv code is not designed for input data having any particular content but for a large class of sources.