Many users store various types of documents in a remote repository (commonly known as “cloud storage”), administered by an external entity. As the term is generally used herein, a document can correspond to any unit of information, such as a text-bearing document (such as email), a music file, a picture, a financial record, and so on. A user may opt to store documents in the remote repository for various reasons, e.g., based on factors pertaining to convenience, accessibility, storage capacity, reliability, etc.
Contractual obligations may require the entity which administers the remote repository to minimize the risk of unauthorized access to a user's documents. However, from a technical perspective, there may be little which prevents the entity itself from accessing and examining personal documents of a user. This may understandably unsettle a user. For instance, the user's documents may contain sensitive information that the user does not wish to divulge to any person, including the entity which administers the remote repository.
A user may address this concern by encrypting the documents and storing the documents in encrypted form at the remote repository. This approach effectively prevents the entity which administers the remote repository (or anyone else) from examining the documents. However, this approach also prevents the user from performing any meaningful operations on the documents that are stored in the remote repository. For instance, the encryption of the documents precludes the user from performing an on-line search of the documents. The user may address this situation by downloading all the documents and decrypting them. But this solution runs counter to the user's initial motivation for storing the documents in the remote repository.
The cryptographic community has developed technology that is referred to herein as Symmetric Searchable Encryption (SSE), which utilizes symmetric key encryption to generate an encrypted index that can be employed in connection with keyword searches. That is, a user can set forth a keyword, and the encrypted index can be analyzed to locate documents that include such keyword, wherein the entity that administers the remote repository that retains the encrypted files and index remains unaware of which files include which keywords. At least some SSE schemes require linear time search, where each indexed document is analyzed to ascertain whether the respective document includes a keyword. This approach may be prohibitively inefficient, particularly for relatively large document collections. Other existing SSE schemes allow for sublinear search; however, these schemes are inefficient with respect to updating an index that is employed when a document collection is searched. This can be problematic for data collections where documents frequently change, such as emails.