A cloud data center may provide cloud computing services to various computing systems such as desktops, laptops, tablets, smartphones, embedded computers, point-of-sale terminals, and so on. A cloud data center may have many thousands of servers and storage devices and provide various software products such as operating systems, databases, and applications. Rather than maintaining their own data centers, many enterprises subscribe as customers of a database service of a cloud data center to store and process their data. For example, a retail company may subscribe to a database service to store records of the sales transactions at the company's stores and use an interface provided by the database service to run queries to help in analyzing the sales data. As another example, a utility company may subscribe to a database service for storing meter readings collected from the meters of its customers. As another example, a governmental entity may subscribe to a database service for storing and analyzing tax return data of millions of taxpayers.
Enterprises that subscribe to such cloud-based database services want to ensure the privacy of their data. Although cloud data centers employ many sophisticated techniques to help preserve the privacy of customer data, parties seeking to steal such customer data are continually devising new counter-techniques to access the data. To help ensure the privacy of their data, many customers may encrypt their data locally before sending their data for storage by a database service. For example, each point-of-sale terminal of a retail company may encrypt the sale amount of each transaction and send the sale amount only in an encrypted form to the database service as a record of the transaction. If the retail company wants to determine the total sale amount for each store, the encrypted sale amounts for each store would need to be downloaded to a company computer and then decrypted. The decrypted sale amounts for each store could then be added together to generate the total sale amount for each store.
If a customer were to use a homomorphic encryption of data, then the downloading and decrypting of all the sales data could be avoided. Homomorphic encryption has the characteristic that a computation performed on the encrypted data generates an encrypted result that, when decrypted, equals the same result as if the computation was performed on the unencrypted data. For example, if the retail company homomorphically encrypts its sale amounts, then the database service could add the encrypted sale amounts for each store to generate an encrypted total sale amount for each store. The retail company need only download the encrypted total sale amount for each store and decrypt those total sale amounts.
A problem occurs, however, when an aggregation is to be performed for a subset of the sales amounts. For example, if the retail company has stores in multiple countries, then in order to aggregate the sales amounts for the stores in a certain country, the database service would need to know in which country each store is located. To allow such aggregation, the retail company would “deterministically” encrypt the country for each store. A deterministic encryption will always generate the same encrypted value for a given value. So a database table with a row for each store and columns for country and sales amount will have the same value in the country column for each row whose store is in the same country. By using a deterministic encryption, the database service can generate a total sales amount for each country and return each encrypted aggregation along with the encrypted country to the customer. The customer can then decrypt each encrypted aggregation and its corresponding encrypted country to determine the sales amount for each country. In addition, the database service can generate a count of the number of stores in each country. The retail company could then calculate the average sale for a store for each country.
Although homomorphic encryption allows the aggregation of encrypted data to be performed by the database service and thus avoids the downloading of the unaggregated encrypted data, homomorphic encryption can be very computationally expensive. Homomorphic encryption schemes typically use complex mathematical operations such as multiplications, exponentiations, matrix operations, and so on. As a result, many organizations either choose not to use homomorphic encryption or need to expend significant amounts of money purchasing additional computational power that is needed to support homomorphic encryption.
Although deterministic encryption allows aggregations on subsets of data, deterministic encryptions are susceptible to frequency attacks. A frequency attack allows an attacker to gain knowledge of the unencrypted data by examining the corresponding deterministically encrypted data. For example, an attacker with access to the country column of the table for a retail company could determine the country distribution of the stores, although the attacker would not be able to tell which stores are in which country. If, however, the attacker knew that a certain country had the largest number of stores, then the attacker could identify the most frequent encrypted country value and know that that value is an encryption for that certain country. Knowing exactly how many stores are in that certain country may be useful information in itself. However, knowing the encrypted country value for a certain country can be useful to help break the encryption scheme.