Data storage utilization is continually increasing, causing the proliferation of storage systems in data centers. In particular, the size of the applications and the data generated therefrom is increasing. Moreover, systems/users are backing up multiple copies of a given set of data to maintain multiple versions. For example, snapshots of a given database stored in a server are copied and stored over time, thereby allowing a given version/snapshot of a set of data to be restored. Typically, much of the data remains the same across different snapshots. For example, if the data is backed up for a given user on a daily basis and such user is updating only one of the number of files for a given day, the data in this file is the only data that has been modified. Accordingly, conventional backup operations include the deduplication of backup data.
Conventionally, data received at a backup storage system (such as a deduplicating storage system available from EMC® Corporation of Hopkinton, Mass.) is deduplicated by eliminating duplicate chunks across multiple backups. In some instances, the data arrives at the backup storage system encrypted. Data is encrypted by the customer for various reasons, including for example, security and compliance. Conventional deduplicating backup storage systems cannot find redundant chunks across multiple backups even for the same backup file because different instances of these backups might be encrypted differently (e.g., using different encryption algorithms, keys, seeds, etc.). Even if the encrypted data is a small fraction of all the data being backed up, the storage usage on a backup appliance can grow very quickly, and the result is inefficient use of storage.
In order to avoid inefficient use of backup storage capacity, users/customers are currently required to decrypt the data before it is sent to a backup storage appliance. Such a solution, however, exposes the customer to the potential for a security breach, e.g., sensitive data may be exposed to users that are not privileged to access the data.