Data compression or source coding is the process of encoding information using fewer bits than an unencoded representation would use through use of specific encoding schemes. As with any communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme.
One data compression algorithmic, known as deflate algorithm, is based on two compression algorithms: LZSS or LZ77, and Huffman coding. LZSS achieves compression by replacing common strings with pointers to previously seen occurrences of the string. Huffman coding replaces common byte values with shorter bit sequences and uncommon byte values with longer sequences.
During compression, the compressor can encode a particular block in two different ways: First, it can used a fixed Huffman coding tree, which is defined in the RFC that defines the deflate algorithm. Second, the compressor can examine the block being compressed and generate an optimal Huffman coding tree, and transmit the tree definition along with the compressed block.
However, there are trade-offs with either coding scheme. In the first case, the default fixed coding tree may not accurately represent the probabilities actually encountered, which means the compression ratio could be lower than it would be with a more accurate tree (in this context, the compression ratio is the number of bytes shorter than the original data stream for the compressed stream, divided by the original length of the data stream, expressed as a percentage; for example, a file that starts out as 1000 bytes and gets compressed to 720 bytes has a compression ratio of 28%). In the second case, the space required to send the tree definition reduces the compression ratio. In both cases, the Huffman coding tree being used is static, and only expresses the global statistics of the entire block being compressed.