Single instance storage (SIS) is a technique to increase data storage utilization by detecting duplicate data blocks in a set of data blocks, and then storing only one instance of a duplicate data block, and using a reference to point to the original single instance copy of the data. Duplicate blocks, which are fairly common, are identified by computing a checksum (e.g., signature/hash) of the data, and storing the checksum, data pair in some form of a lookup table. As can be readily appreciated, not only is storage space reduced by single instance storage, but in networking scenarios, overall network traffic may be reduced by avoiding the need to transfer another copy of already stored data over the network.
In the context of data backup as a service, or data archival as a service (or any other service that stores a customer's data), the owner of the backed-up data may often want to keep the data confidential from the service provider, such as for business secrecy or for regulatory compliance reasons. In such scenarios, the data to be stored first needs to be encrypted at the customer's site so that it is secure in transit over the network and when stored at the service provider's storage systems.
At the same time, the principle of layered security requires that data not be encrypted with a single key, so that a breach of one key does not compromise all of the data. Further, with time, methods of encryption, key lengths, and so forth often change. One result is that the same block of data encrypted and transferred to storage at different times and/or from different sources will often have a different encrypted form.
As a consequence, single instance storage and data encryption do not work well together. More particularly, because the encryption process “randomizes” the data bits to an extent, it is very unlikely that two encrypted data blocks of any data set will be identical to one another when processed into their encrypted forms. Thus, a service provider obtaining a set of encrypted data blocks generally cannot perform single instancing without decrypting the data, which is undesirable and often not allowed by clients.