Organizations typically employ a number of technologies to meet data storage demands, including local storage devices, enterprise storage networks and cloud-based storage services. As each organization grows, reducing total storage space is a substantial concern. Data deduplication generally refers to detecting, uniquely identifying and eliminating redundant data blocks and thereby reducing the physical amount of bytes of data that need to be stored on disk or transmitted across a network. Implementing data deduplication results in considerable savings in the amount of bytes which need to be stored and/or transferred between storage devices.
At the same time, users want their data inaccessible to others, and thus storage of encrypted data is desirable, especially on cloud-based storage services where the users cannot prevent access by others. Because conventional encryption schemes randomize file data such that each data block corresponds to a certain output, it is difficult to determine if a data block within an encrypted file is a duplicate of another encrypted data block. Implementing such an encryption scheme, therefore, hinders effective data deduplication and vice versa. Deduplicating encrypted data is not practical without implementing cumbersome access control mechanisms for each encrypted file sharing duplicate data. Even though convergent encryption technologies provide a workable deduplication system that also encrypts data, each user, regardless of permission, has an encryption key to each file, which renders impractical the prevention of unauthorized access through encryption. Hence, a storage technology's data deduplication capabilities is restricted by security concerns.