In datasets typically stored on storage systems, data deduplication is a technique for eliminating duplicate copies of repeating data, thereby improving storage utilization. Additionally, in data network environments, data deduplication can be applied to network data transfers in order to reduce the amount of data to be transmitted over the network. In a data deduplication process, unique chunks of data (i.e., byte patterns) are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.
The potential savings that deduplication can yield are profound. For example, in workloads that have inherent repetitions (e.g., backup scenarios), deduplication can reduce required storage with ratios ranging between 1:2 and 1:50.
The description above is presented as a general overview of related art in this field and should not be construed as an admission that any of the information it contains constitutes prior art against the present patent application.