The present invention relates to techniques for searching files and more particularly to techniques for searching encrypted files.
Searching is an important and extensively used operation in computer applications. For example, a plurality of files stored on a file server may be searched to determine a set of files that contain a particular user-specified word, a list of uniform resource identifiers (URIs) may be searched to determine if a user-specified URI is in the list, a list of available resources may be searched by an access control application to locate a resource and to determine access rights associated with the resource, a particular file's contents may be searched to determine if a particular keyword is included in the file contents, and several other applications.
There are a number of different approaches to searching. According to one approach, searching can be modeled as follows: given a searchable space S comprising elements from some domain Σ, and given a target or query element k from domain Σ (i.e., kεΣ), searching is a process that determines if target element k is included in searchable space S (i.e., if kεS). In addition to determining whether or not searchable space S includes query element k, the search process may also identify one or more locations within the searchable space where the query element is found. Domain Σ can be any arbitrary domain, e.g., the set of integers, the set of real numbers, a set of strings of alphanumeric characters, etc. Searchable space S might manifest itself in various forms, for example, set S might be a file, a plurality of files from one or more file systems, a list of URIs, a list of resources, etc. Search techniques typically attempt to minimize the time and processing resources needed to determine if kεS.
Given a searchable space comprising a plurality of files, a common search request involves determining all files in the plurality of files that contain a particular query element k. Query element k may be a string comprising one or more words from a particular domain Σ. Several search techniques have been developed to service such a search request. According to one technique, each file in the plurality of files is sequentially searched to determine occurrences of the query string k in the file's contents. Information identifying files that contain at least one occurrence of the query element is then output in response to the search request. According to another technique, an inverted index may be generated for the plurality of files to be searched. The inverted index is then used to determine files that contain the query element. According to yet another technique, signature files that employ hashing techniques may be used to process the search request. Several other techniques may also be used to process the search request.
The various search techniques described above for processing the search request all presume that the searchable space (e.g., the plurality of files) are not encrypted (i.e., the text files and documents are in readable, known formats). As used herein and in the literature, the term “plaintext” refers to data that is not encrypted (the opposite of being “ciphertext” or data that is encrypted). There are several instances where the searchable space is ciphertext and the presumption does not hold true. For example, data of a sensitive or confidential nature (e.g., credit card information, bank account information, etc.) is usually stored in files that are stored in encrypted form. Conventional search techniques, which are tailored for searching plaintext files, generally cannot be efficiently (in terms of computation time and resources required for the search operation) used for searching encrypted files or ciphertext.
One sector for example that has seen a heightened demand for efficient search techniques that are capable of searching encrypted files is the area of electronic commerce activities. Information of a sensitive and confidential nature, such as credit card information, bank account information, or the like, is generally used for processing online transactions. Due to the “openness” of the Internet, this information used for processing online transactions is generally stored in encrypted form to preserve the privacy of the users and confidentiality of the information. As a result, in order to be able to respond to customer requests in a timely manner, merchants and other entities that engage in online commercial transactions need to use fast and efficient search techniques that are capable of searching encrypted or ciphertext files. In order to be cost effective, the online merchants prefer to use search techniques that require reduced memory and computing resources to perform the searches so as to minimize costs associated with the searches. For example, online banking institutions and credit card companies who authorize payments for online commerce activities need to use efficient search techniques that can process consumer requests in a timely manner while minimizing costs associated with the searches.
In light of the above, there is a need for search techniques that can search encrypted searchable spaces (e.g., encrypted files) in an efficient manner while minimizing the memory and computing resources required to perform the searches.