Various parties (e.g., corporations, governmental agencies and natural persons) face a common dilemma: how can parties share specific information (e.g., health care data, customer prospect lists, a terrorist watch list, black list or a list of actual or potential problematic entities) that can assist the parties via business optimization, improved analysis, or detecting the presence of potential terrorists or other problematic parties, while maintaining the security and confidentiality of such information.
Hesitation to contribute or otherwise disclose, as well as laws governing the use and disclosure of, certain information is predicated upon a concern that the information may be subjected to unintended disclosure or used in a manner that may violate privacy policies or otherwise cause damage to the party. Such damage may include identity theft, unauthorized direct marketing activities, unauthorized or intrusive governmental activities, protected class (e.g., racial, religious, gender, ethnic) profiling and discrimination, anti-competitive practices, defamation, credit damage, or economic damage.
Conventional systems use various means to transfer data in a relatively confidential manner within or between parties. Although this technology has proven to be useful, it would be desirable to present additional improvements. For example, some conventional systems use a reversible encryption method, which modifies the data to engender some level of confidentiality. The encrypted data is transmitted to a recipient, who uses a comparable decryption method to return the encrypted data to its original format. However, once the data is decrypted, such data is subject to potential loss or use in an unapproved or illegal manner that may cause the very damage that the encryption process was intended to prevent.
Other conventional systems use irreversible cryptographic algorithms, or one-way functions, such as MD-5 (also referred to as message digest 5), to obfuscate sensitive or confidential data. Existing irreversible cryptographic algorithms cause data to be undecipherable and irreversible to protect the confidentiality and security of the data. The irreversible one-way function, when applied to data, results in an identical unique value for the same data regardless of the data source. Therefore, irreversible cryptographic algorithms are often used as a document signature, to make unauthorized document alteration detectable when the document is being shared across parties. For example, suppose a phone number in an original document is altered (for example, by changing the formatting), and irreversibly encrypted. If the original, unaltered data is also irreversibly encrypted, the two encrypted values are different, indicating that one of the electronic documents has been altered.
However, conventional approaches are merely able to determine that information in an irreversibly encrypted format either is an exact match with other irreversibly encrypted information, or is not an exact match with other irreversibly encrypted information. For example, if two numbers, 1000 and 1001 are irreversibly encrypted, conventional approaches can determine that the two encrypted numbers are not an exact match. Conventional approaches are unable to determine, from the encrypted numbers, that a majority of digits of the two original numbers match. In general, when obfuscating numbers through one-way hashing functions such as, for example, MD5, SHA-1 (Secure Hash Algorithm 1), SHA-245 (Secure Hash Algorithm 245), etc., the ability to perform any similarity measures on the hashed number set is removed.
Therefore, there is a need for a method to compare irreversibly encrypted values to determine whether one encrypted value is similar to another encrypted value and determine a measure of similarity between the original values, i.e., “fuzzy” matching. There are no known solutions for “fuzzy” matching in the on-way hashed, encrypted or otherwise anonymized data space. Thus, the need for such a solution has heretofore remained unsatisfied.