Data deduplication generally involves eliminating duplicate storage of identical data. In contrast to some data compression techniques that identify small quantities of data that are repeated within a file, data deduplication identifies large sections of data that are duplicative and stores only one copy of that data. References to the one copy are established for the files having that duplicated data.
In some data deduplication techniques, a particular hash function is computed on individual blocks of a file, and the hash value is compared to hash values that have been previously computed for different blocks and/or different files. If the hash value matches a previously computed hash value, the block is compared to the previously stored block. If the data matches, a reference to the previously stored data is stored instead of storing the data block.
Data encryption may create problems for data deduplication techniques. Different encryption keys for different users and different files are often used to protect the file data. Since different encryption keys are used, the hash values of the stored data will not match for files containing identical underlying plaintext data. Thus, data deduplication efforts may be unable to satisfactorily protect data from unauthorized access, and data encryption efforts are generally unable to benefit from data deduplication techniques.