Patient data privacy is one of the primary concerns in the health care domain, and this is more significant in a cloud scenario, where Protected Health Information (PHI) data may be continuously uploaded to the cloud environment, which is in the public domain. Since the data from various tenants or hospital environments are stored in the cloud centrally, the volume of data is extremely large and may easily run into peta-bytes of data. Typically, such upload of data from hospital environment to the cloud is governed by availability of internet connection and suitable bandwidth.
Apart from maintaining the privacy of the PHI data, the PHI data in a multi-tenant environment is to be isolated. When there are several hospital environments storing their data on a single cloud based service provider, each of the tenants is to have data maintained in an isolated form without compromising on security. This forms an important tenet in the cloud paradigm.
A standard approach is to encrypt the data before the data is stored in the cloud with private keys of the tenants or the owner of the data, such that the data on the cloud may not be decrypted. This solves the security concerns. This, however, brings up a new problem of searching in such encrypted data, without decrypting the encrypted data on the cloud. Another problem that is commonly faced is during unavailability of a network connection. In such scenarios, there is interruption in the uploading of data, as the uploading of data is deferred and cached until the network connection is restored. This causes delay and disruption in normal functioning of the system.
In typical medical applications in a cloud scenario, the client may be a web browser. A typical approach for solving the above mentioned problems would be to download the complete encrypted data on to the client, then decrypting the encrypted data using a private key of the tenant and searching in this decrypted data. In a cloud scenario of the volume of data being large, this approach is not practical, as the time to download the data on to the client side would be quite large. Also, as the cryptography is asymmetric, the time taken to decrypt is also very high. Another approach is to perform the search on the cloud itself. However, since the data in this context is encrypted and the encrypted data may not be decrypted in the cloud, this option is also not feasible. There are some other solutions based on searchable encryption. However, the other solutions tend to be more vulnerable, as the security of the encrypted data is compromised.
The published paper titled “Secure Search for Massive, Public Cloud Hosted Medical Data Volumes” is relevant for this disclosure. The focus of this published article is on the way in which multi-level index is employed to solve the issue of very large data set searching, where the data being searched is coming from various entities. The index is divided based on the individual field that is being indexed and the focus is on multi-level indexing. This uses multiple roundtrips for a single search operation and involves additional decryption overhead. Also, multiple read operations are to be performed to index one record. As this solution makes use of hierarchical index structures, individual data upload is to read and update multiple buckets or index list or both. This is relatively more complex and results in poorer performance, as there is slow update of the index.
In the above prior art disclosure, the index is divided based on the alphabetical range. This makes the index complex as such a division of index leads to performance and concurrency issues during index update where merging of multiple indexes when new or incremental indexes arrive from multiple clients is to be handled.
Another relevant article related to this field is written by Ming Li, Shucheng Yu, Ning Cao and Wenjing Lou, titled “Authorized Private Keyword Search over Encrypted Personal Health Records in Cloud Computing.” This article addresses the problem of authorized private keyword searches on encrypted patient health records in cloud computing environments. The article discloses a scalable and fine-grained authorization framework for searching on encrypted patient health records, where users obtain query capabilities from localized trusted authorities according to attributes, which is highly scalable with the user scale of the system. This article mainly describes authorization and privacy control mechanism for query search using searchable encryption.
Another relevant publication related to the field of searchable encryption is titled “Practical Techniques for Searches on Encrypted Data,” written by Dawn Xiaodong Song, David Wagner and Adrian Perrig. The publication addresses the problem of searching on encrypted data and providing proofs of security for the resulting crypto systems. The publication discloses a technique for remote searching on encrypted data using an untrusted server.
Prior art US 2009/0300351 A1 discloses a method, apparatus and system for fast searchable encryption. The data owner encrypts and stores the cipher text to the server. The data owner generates an encrypted index according to each keyword of the files and stores the encrypted index to the server. This disclosure provides formation of keyword or file specific indexes and related keys to use for distribution to different roles.
Prior art US 2013/0254539 A1 also provides searchable encryption techniques for obscuring data stored at remote site or in a cloud service, distributing trust across multiple entities to avoid a single point of data compromise.