The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
De-duplication of data generally involves eliminating redundant copies of data items. According to one approach, storage de-duplication provides an encoding for a file system. The encoding is such that identical data blocks, which appear in multiple files in the file system, are stored only once physically in a disk repository and are pointed to by the various files' metadata structures. For example, instead of storing data blocks, a file in the encoded file system may store references to data blocks, some of which may be shared with other files, where each data bock is associated with a reference count.
According to another approach for de-duplication of data, network de-duplication reduces the amount of traffic being sent over a network by eliminating the transfer of data blocks that have been already been sent in the past. By doing so, the network de-duplication approach may achieve two objectives—bandwidth savings and faster transfer (assuming that the processing time to de-duplicate data blocks does not outweigh the time savings due to transfer of less data).
While network de-duplication and storage de-duplication bear some similarities, they have different and conflicting motivations. Storage de-duplication aims at optimizing storage utilization and trades it off with overhead in performance. In contrast, network de-duplication aims at optimizing network performance and trades it off with overhead in storage. Thus, the objectives and trade-offs of the network de-duplication approach and the storage de-duplication approach are directly opposite to and conflicting with each other.