Storing encrypted data or other private data is often a potential security risk for any entity. The personally identifiable information often has value to fraudsters, making a data store with personally identifiable information a target for hacking attempts. Such private data can be encrypted in a database so as to protect access from a hacker. However, encrypted data within a database may not easily be searched. In many implementations, data is stored in a database using a hash function and any query against the data is required to be an exact match of the hash value. This typically requires a user to enter an exact value of the string to be searched. This exact value may be hashed using the has function and an exact match found. This may conflict with business needs that require that personally identifiable information be searchable using wild cards or other partial data searches.
Encryption of encrypted data, for example, address line, phone number, and fax number is important for protecting the privacy of those identified. A problem is that once encrypted, these fields are hard to access via search. Many applications that use encrypted data can be written such that they need full text searches on these fields. For example, a customer service representative (CSR) application relies on being able to search for customer data when assisting a customer. As an example, a CSR may enter a phone number (e.g., 123-456-7890) into a search field and submit a search query. If the data being searched is not encrypted, then the application need only make a direct comparison to identify the relevant database records.
However, if the data being searched is stored in encrypted format, it may not be searchable. A typical solution for this scenario is to use a hashing function. For example, one solution may be to use a one-way cryptographic hash function. A general hash function is a function that maps digital data of an arbitrary size to other digital data of a fixed size. A cryptographic hash function allows for verification that input data matches a stored hash value, while making it difficult to construct any data that would hash to the same value or find any two unique data pieces that hash to the same value.
A system could perform a hashing operation on the plaintext encrypted data and store those hashes. When a user submits a search query for a particular piece of encrypted data, for example the phone number 123-456-7890, the same hashing operation is performed on the phone number, which is compared to the stored hashed values to determine if the hash matches any records. This solution is often feasible when exact search terms are used (e.g., where a user types in the full, exact data that they are searching for).
Unfortunately, in a more typical scenario, a user may not enter the entire data that they are searching for, in this case 123-456-7890, and instead may enter 123-456*, or similar, where * represents a wildcard character. The user in this scenario would be submitting a wildcard search which is intended to result in a set of relevant results. However, in a system that requires an exact match to equal a hash value, this use of wildcards would not be permissible.
Embodiments of the invention address these and other problems, individually and collectively.