Organizations and consumers increasingly use third-party services to store data. Third-party storage services may provide a number of benefits to customers, including flexibility, low capitalization requirements, off-site backups, and centralized access to data.
In order to maximize the efficiency of their storage systems, third-party storage vendors may attempt to deduplicate the data received from their customers. For example, if two customers each wish to store a copy of the same block of data, a third-party storage vendor may, instead of storing two copies of the data, store a single copy of the data and reference the copy twice.
Many third-party storage customers want or need to encrypt their data before submitting the same to a third-party storage vendor. For example, individual consumers may wish to encrypt data sent to third-party storage vendors due to privacy concerns. Similarly, organizations may wish to encrypt data sent to third-party storage vendors in order to ensure compliance with internal or external data-protection requirements, such as governmental laws and regulations, partnership agreements with other organizations, etc. Unfortunately, by encrypting data before submitting the same to a third-party storage system, customers may interfere with a third-party storage vendor's attempt to deduplicate the data. For example, if two customers encrypt identical blocks of data using different encryption schemes (e.g., different keys), the resulting encrypted blocks of data will differ, potentially preventing the third-party storage vendor from deduplicating the two blocks of data into a single block that is referenced twice.
In some cases, a third-party storage vendor may require that its customers use convergent encryption techniques (also known as content hash keying) when encrypting data in order to allow the storage vendor to subsequently deduplicate the encrypted data. In convergent encryption, an encryption key for encrypting a block of data may be derived from the block of data itself, such that identical blocks of data may result in identical encrypted blocks of data. Unfortunately, convergent encryption techniques may expose encrypted data to certain brute-force attacks, such as learn-partial-information attacks. For example, if a customer encrypts (using convergent encryption) a document that contains both publicly available information (such as a government form) and sensitive data (such as a Social Security number populated in the government form), then the attacker may progressively populate and convergently encrypt the publicly available government form with each possible Social Security number combination until the encrypted version of the document created by the attacker matches the version encrypted by the customer, thus revealing the customer's Social Security number.
In view of the above limitations, the instant disclosure identifies a need for securely encrypting and deduplicating data owned by multiple entities.