Conventional data storage and retrieval methodologies maintain data, such as documents or email, in a repository for efficient storage and shared retrieval. Keyword fields can be defined over the data to facilitate searches through queries that specify target keywords for one or more keyword fields. The keyword fields identify specific documents through headers or other metadata associated with the data.
Generally, these methodologies assume that adequate bandwidth and processing are available between the repository and user systems seeking to search for data matching queried keywords. Recently, advances in mobile technologies and wireless networks have greatly enhanced accessibility to remotely maintained data repositories. However, mobile devices often tradeoff portability for processing and storage capabilities, while wireless networks sacrifice bandwidth for increased availability. As a result, users increasingly resort to storing their data on a server that provides a central data repository readily accessible by mobile devices and via wireless networks.
Storing sensitive data on a server providing a remotely-accessible central data repository requires a level of trust in the server relative to the stored data. Alternatively, to ensure confidentiality against an untrusted server, a user can encrypt the data, which will also protect against data compromise while the data is in transport. Encryption ensures that the server or other non-authorized users derive no knowledge from the contents of the stored data. In particular, encryption makes selective data retrieval by the server impossible, since the server cannot determine or select specific data based on search criteria. Yet the ability to retrieve data selectively is important to preserve the bandwidth resources of the user.
One approach to enable a server to identify specific data containing a certain keyword is provided through capabilities, such as described in Song et al., “Practical Techniques for Searches on Encrypted Data,” Proc. of IEEE Security and Priv. Symp. (2000), the disclosure of which is incorporated by reference. Each capability reveals only the data that contains a given keyword in a given keyword field and discloses no other information. The data and keywords are encrypted by the user in a way that later lets the user generate capabilities that enable the server to identify data matching a given keyword in a given keyword field without compromising the confidentiality of either the data or keyword. A capability reveals only the keyword field that it applies to, and the data that matches the queried keyword in that field. The server learns no information from the encrypted data without the capability.
In existing work, each capability is limited to only allowing the server to identify a subset of the data that matches a specific keyword in a specific keyword field. Conversely, capabilities do not generally allow a server to directly search data through Boolean combinations, such as conjunctive searches. Individual single-keyword capabilities can be combined by the server to intersect individual subsets of search results and derive conjunctive search results. This methodology, however, allows the server to indirectly associate specific encrypted data with each keyword and further information could eventually be derived by combining knowledge of statistically likely searches. This approach is unsatisfactory because the privacy of the data is compromised to some extent. Alternatively, a user can store additional information on the server in the form of meta-keywords to facilitate conjunctive searches. A meta-keyword is defined for every possible conjunction of keywords and is associated with the encrypted data across the various keyword fields. This methodology, however, requires an exponential amount of data storage for the 2m meta-keywords generated for each document that contains m keyword fields. This second approach is not satisfactory due to the excessive storage costs incurred on the server.
Therefore, there is a need for an approach to conjunctive searches of encrypted data using communication and storage efficient queries that increase data privacy against an untrusted server.