This invention relates to hashing of data. A hash function is a reproducible method of turning data into a relatively small number that can serve as a “digital fingerprint” of the data. The hash function typically substitutes and transposes the original data to create the resulting hash. Hashes have a range of different uses, for example, they are often used as indices into hash tables or hash files, or for various purposes in information security applications. Two particularly useful areas in which hashes play an important role include identity resolution applications and relationship resolution applications. Identity resolution applications attempt to answer the question “Who is who?”, that is, they attempt to determine whether multiple records that appear to describe different identities are actually for a single resolved identity. Relationship resolution applications attempt to find out “Who knows who?”, in order to determine potential value or danger of relationships among entities, such as customers, employees, vendors, or other entities, for example, by cross-referencing data from multiple formats from various sources.
Many of these applications use preset, hardcoded hashing algorithms. A drawback of hardcoded hashing algorithms is that they lack flexibility and rely on the data to be hashed is presented in a certain predefined format. Data that does not adhere to the predefined format can render bad or unusable hashes. One example of this is hashing of addresses. Relying on the street name as being an important part of the address hash might work well for U.S. addresses but some foreign countries, for example, Japan and Brazil, do not use street names to identify addresses. Another example is hashing of names that relies on first and last names. Not all countries and cultures follow the first and last name convention, which may cause problems when attempting to generate such a hash.
Another problem with hardcoded hashes is that when the hashes are used for finding matching entries in a data store, for example, a database, they always return the same set of matches. That is, a user can not obtain a wider or more narrow range of matches if he so desires, based on the hashes. Thus, there is a need for more flexible hashing techniques.