In telecommunications, data de-duplication is a data compression technique that reduces redundancy in data streams by replacing redundant instances of data blocks with compression symbols associated with, or otherwise identifying, earlier instances of the data blocks. One type of data de-duplication scheme uses history table compression symbols that identify a history table entry associated with the data block. More specifically, an original instance of a data block is identified and stored in a history table, and a hash algorithm is applied to two or more data chunks within the data block to obtain hash values for those chunks. The hash values are then stored in a hash table, which associates the hash values with a location of the stored data block in the history table. The hash algorithm is then applied to data chunks of an incoming data stream, and the resulting hash values are compared with the hash table to see if there is a match. If the resulting hash values match an entry in the hash table, then the incoming data block is compared with the stored data block to determine whether they are the same. This comparison may be performed by aligning the corresponding data chunks (e.g., the chunks producing the matching hash values) to see if the data blocks match. If the data blocks match, then the entire incoming data block is substituted for a compression symbol prior to forwarding the data stream to the next-hop to achieve compression. A downstream node performs identical processing (e.g., hashing, storing, etc.), and therefore maintains an identical hash table as the upstream node performing the compression. As a result, the downstream node receiving the compressed data stream simply replaces the compression symbol with the data block to decompress the data stream.
Another type of data de-duplication scheme uses backward reference compression symbols that identify a position of the earlier data block instance within the data stream. In both schemes, the compression symbols are much smaller than the redundant data blocks, and therefore can greatly reduce the volume of information transported over the network, particularly when the same data block is transported many times over.