1. Field
This disclosure relates generally to database security, and, more particularly, to secure wildcard searching of a database.
2. Background
It in not uncommon for companies to have numerous databases and to use in-house Information Technology (IT) staff or even third parties to maintain the databases. Moreover, those databases may be located in places where the owner cannot control physical access to the devices or storage making up the database or to communication lines to and from such databases (hereafter referred to as “remote” storage).
In some cases, these databases may contain sensitive or confidential information which the owner wishes to ensure cannot be accessed by unauthorized persons, including the owner's IT staff, for example, database administrators. One way of doing so is by encrypting the database contents and maintaining the decryption key separate from access by those who could access the database but are not authorized to have access to its sensitive contents, while keeping the decryption key accessible to the relevant secure computers of those with legitimate need to access. By doing so, only persons who have such access to the decryption key can meaningfully and securely access the contents by retrieving the desired encrypted data to a secure computer and then decrypting it. However, this can present a security problem when such authorized users wish to search the database contents.
In general, there are only two ways to search a remote encrypted database. One may either decrypt the contents and perform the search on the unencrypted contents, which compromises the security of the remote database because it becomes possible for unauthorized persons to gain access to the unencrypted contents, or only allow “exact string” searches. An exact string search is less likely to compromise security because the string to be searched can be encrypted in the same manner as the database contents and the resulting encrypted string can be matched with the similarly encrypted database contents. Thus, if the term “dollars” encrypts to the string “x%Wz3&7”, which would be meaningless to any observers who have access to the remote storage, the database can be searched for the exact encrypted string x%Wz3&7 and return to the user encrypted results containing that string for decryption on the user computer, thereby maintaining the security of the database contents in the remote storage.
The same is not generally true however for wildcard searches in which not only the exact search term is sought, but any other terms that contain variants of the term delimited by the wildcard character(s). With a wildcard search, for example, an asterisk can be used as a character that denotes any one or more characters that follow the specific string preceding it. By way of example, a search for “distr*” in a database can be used to obtain all records containing “distr” followed by any number of characters. Thus, in this example, a search for “distr*” might return all records containing any of the following terms: “distress,” “distressed,” “distribute,” “distribution,” “distraction,” and “district.”
This wildcard search problem is compounded if wildcards can be within a string (for example, “wom*n” to search for woman and women), pre-pend a string (for example, “*posit” to search for deposit and composite), or multiple wildcards can be part of a single string (for example, “*port*y” to search for “reportedly,” “proportionally” and “exportability)”, because the encrypted form of the target words could be encrypted very differently and thus the search term would not be easily matched to the contents of the encrypted database.
As a result, there is a need for a way to allow for wildcard searching of encrypted remote database content that does not concurrently compromise security of the content.