The present invention relates generally to the field of computer backup systems, and, more particularly, to systems and techniques for deduplication.
Deduplication is a process for removing redundant data during data backup operations. In particular, if two saved objects are duplicates of each other, then only one of the objects needs to be stored. Thus, the amount of data to be stored can be reduced. Deduplication has become ubiquitous in capacity optimized storage systems. Traditionally, data security for deduplication is achieved by establishing a secure connection between a data source (e.g., company) and a data storage target (e.g., cloud storage provider) for unique data chunk transport. Unique data chunks are then encrypted at storage. A chunk refers to a unit of data resulting from dividing content into multiple pieces, i.e., chunks. Encryption is a form of security that turns information, images, programs, or other data into unreadable cipher by applying an encryption key. A key is a variable value that is applied using an algorithm to a string or block of unencrypted text to produce encrypted text, or to decrypt encrypted text.
In the above deduplication scenario, the original data chunk is needed to compute a signature in order to determine if there already is a copy of the data chunk in the backup storage. A problem with this approach is that the data source does not control the encryption. Data security cannot be guaranteed by the data source, since any data read out of the backup storage will be the original unencrypted data. For security purposes, many companies desire that they and they alone control the encryption keys. Many companies also implement internal protocols where encryption keys are routinely changed for security purposes.
Traditional approaches to deduplication cannot be used when the data chunks are encrypted because two identical data chunks encrypted using two different encryption key versions will appear to be different from each other. As a result, the storage target may include many pieces of redundant data having been encrypted using older and newer encryption key versions.
Therefore, there is a need for improved systems and techniques that can be used to deduplicate encrypted data objects while also ensuring the security of the data.