Various entities such as individuals, companies and governments continue to gravitate toward storage of sensitive data in the “cloud”, e.g. the internet, that is composed of one or more networked data servers. The sensitive data stored in the cloud may vary from personal health records (PHR) to private financial information, among other sensitive data. The cloud provides flexible data storage and accessibility options that can be dynamically modified to meet the storage needs of the various entities.
While the cloud environment provides flexibility, the security of the sensitive data in the cloud environment continues to be an issue. Data encryption is commonly used to ensure that access to the sensitive data is only possible when the correct decryption key is provided. In some systems, trust is given to the cloud service provider to manage the encryption keys on the entity's behalf. This implies that although the data is encrypted, the cloud service provider has the ability to see the unencrypted sensitive data. For example, even though personal health records may be encrypted in the cloud, the encryption keys are managed by the personal health record system provider. Therefore, it is possible for anyone with access to the cloud provider's infrastructure to gain access to all the records.
In one system, PHR using Attribute Based Encryption (ABE) is utilized to encrypt and store PHRs on semi-trusted servers using access control policies chosen by patients. Although the use of ABE PHR systems preserve the privacy of patients, these systems disadvantageously prevent health organizations from querying the PHRs on the system. To produce statistical information about PHRs, patients would have to give health organizations access to all PHRs using ABE. However, health organizations often fall short in protecting the privacy and security of patient information. Further, some of these health organizations also end up having at least one issue with information security and privacy. For example, the most frequently observed issue is the improper use of protected health information by an employee of the organization.
In other types of systems, it is possible to build a privacy preserving system where data is encrypted, and users keep the decryption keys to ensure access is not given to the cloud service providers. However, this approach is not popular due to the limited features such as data sharing and data querying that the cloud service provider can provide due to the user held decryption keys.
Querying of sensitive data is an important feature that is often relied upon by PHR system providers in order to perform general data querying to generate statistical information. For example, general data querying may include querying PHRs for a number of people over the age of forty-five that have a particular disease. However, as described above, if the cloud service provider is allowed to decrypt the sensitive data stored in the cloud in order to run the requested query, the cloud service provider or anyone having access to the cloud service provider's infrastructure may be able to access the sensitive data without authorization.
Therefore, the querying of sensitive data to generate statistical information becomes a problem of making private comparisons as described in Yao's classical millionaires problem. This problem involves two millionaires who wish to know who is richer. However, they do not want to find out, inadvertently, any additional information about each other's wealth. More formally, given two input values x and y, which are held as private inputs by two parties, respectively, the problem is to securely evaluate the Greater Than (GT) condition x>y without exposing the inputs.
One solution is to assume a trusted server and another solution is to assume that the server is semi-trusted. One proposed semi-trusted solution uses a trapdoor encryption method, where two layers of encryption are used. The first layer uses a symmetric key with a secret key, while the second layer uses a pseudo-random number generator and two random pseudo-random functions. However, this trapdoor encryption method only allows for querying for equality. Another proposed semi-trusted solution that builds off the trapdoor encryption method adds secure indices. Another solution modifies this previous add-on such that the data is classified by a collusion hash functions, thus increasing security by preventing the data from being classified sequentially. Another proposed solution for range queries is an encryption scheme for numerical data that allows comparisons to be executed directly on the encrypted data.
Although the above-described solutions have been proposed for secure databases hosted on a cloud server, they cannot be adapted to the above described problem for several reasons. First, to evaluate the query on the encrypted sensitive data, the organization such as the health organization must encrypt the query by the same scheme and the same key that are used by the data owners such as the patients, and send the query to the cloud server. The cloud server may then forward the encrypted query to the data owners, where the query can be decrypted by the encryption key. However, this technique for secure database outsourcing will not protect the query privacy and the database privacy. Second, a common approach in the existing proposed systems is to send a set of encrypted records to the data owner for filtration and further processing. This other technique for secure database outsourcing will not protect the query privacy and the database privacy.
Another solution that has been proposed is to execute Structured Query Language (SQL) queries over encrypted data. This proposed solution depends on a fully trusted component that maintains all the secret and public keys and transforms the requester's SQL queries to a query that can be executed over encrypted records. While this solution has low overhead on query execution time, it requires a fully trusted component which disadvantageously provides a single point of attack. Another solution involves storing records such as Electronic Health Records (EHRs) in an untrusted cloud environment and encrypted using symmetric key cryptography. This solution allows patients to choose what terms in the sensitive data can be searched, and who may be able to access the search terms. This solution also relies on there being a trusted authorized entity who generates keys for users of the system. However, this solution is limited as only specific keywords chosen by the patient can be searched. Further, this and the other described solutions are vulnerable to known plaintext attacks due to the fact that the keywords are encrypted using a symmetric key.