1. Field of the Invention
The present invention relates generally to comparing data entries in different lists, and more particularly to comparing anonymized data entries.
2. Background Art
Numerous situations require comparing a list of data entries against one or more other such lists. For example, such situations may include: airline security personnel matching a passenger list against a list of persons who are suspected of terrorist activity; a bank matching a mortgage applicant's name against a federal database of persons to whom lending is discouraged; and a health insurer checking applicants against a list of persons who are high-risk for receiving health insurance. The comparison process may include, for example, the comparing of names, dates of birth, nationalities or any combination of those or other personally identifying information.
In some applications, it may be desired that the list of query names and the database of names against which the query is run is privacy-protected, at least until a set of initially matching entries are received. For example, a third party private entity may perform the initial screening of numerous passenger lists against a federal terrorist watch-list. Privacy concerns may dictate that the identities of passengers and their personally identifiable data are withheld from the third party private entity. Likewise, it may be desired for security and privacy reasons to withhold the contents of the federal terrorist watch-list from the third party private entity. In this situation, the third party private entity may be required to compare a list of anonymized and/or encrypted names against the terrorist watch-list having anonymized and/or encrypted names. The third party private entity may, after having identified one or more potential matches, send the matching information to a government entity such as the Transportation Security Administration (TSA) that may conduct further investigations.
The comparing of personal information such as, for example, names, is made difficult by variations due to usage, cultural effects, transliteration effects, titles, spelling, inadvertent recording errors, etc. Therefore, many matching systems resort to fuzzy matching techniques, where entries with minor variations can be matched with an associated score that corresponds to the difference of the entry from the correct entry. Similar techniques are used to match other personally identifiable information alone or in combination with names.
However, when the names in the query list and the names in the search database are anonymized or encrypted, fuzzy or approximate matching is not feasible. Although the query word of “John Smit” may only differ in one letter from a database entry of “John Smith”, the encrypted representations of these two names may not have any identifiable correspondence to each other. Therefore, when anonymized comparison is desired, it is generally required that an exact match is sought between a query list data entry and a search database data entry.
What is needed therefore, are systems and methods for accurate comparison of anonymized lists of data.