Field of the Invention
The present invention relates to data deduplication and more particularly to data deduplication through byte caching.
Description of the Related Art
Data deduplication refers to the reduction of a data flow through the elimination of redundancies in data. The act of data deduplication can reduce the quantity of traffic transmitted across a communications channel thereby increasing the responsiveness of communications between network entities exchanging data over the communications channel. Data deduplication can be performed technically in several different ways including data compression, delta encoding, proxy caching and data redundancy elimination.
Data compression removes redundant content on a per-object basis by representing duplicate bytes with hash values. In delta encoding, a technique applicable only to Web based objects, similar portions of a Web object can be represented with a hash value. Proxy caching, like data compression and delta encoding, is an object based method that performs object level deduplication by storing an object that may potentially be referenced later. Data redundancy elimination, also referred to as byte caching, differs from data compression, delta encoding and proxy caching in that byte caching operates at the byte level and is not limited only to Web objects and the hypertext transfer protocol (HTTP).
In byte caching, a combination of an encoder and decoder act in concert over a sideband channel to identify within the encoder regions of repeated bytes within byte streams and to replace these regions with hash values only to be reconstituted as a full byte stream at the decoder. The precision afforded by byte caching can provide the most effective form of deduplication, but not without substantial computational and resource cost.
Specifically, the use of byte caching as middleware in a data processing system can result in undesirable excessive memory utilization, and overutilization of processing cycles creating a bottleneck of throughput at the point of byte caching. Further, to the extent that byte caching relies upon proper fingerprint size selection which can vary in effectiveness for data from different application sources, the utilization of byte caching for data flows of different applications can be effective only for some applications and not others. Finally, much of the effectiveness of byte caching is mitigated through the inefficiency of byte caching upon byte streams lacking redundancy such as encrypted byte streams.