1. Field of the Invention
The present invention relates to a method, system, and computer program product for grouping identity records to generate candidate lists to use in an entity and relationship resolution process.
2. Description of the Related Art
Identity resolution, also known as entity resolution, is an operational intelligence process, typically powered by an identity resolution engine or middleware stack, to allow organizations to connect disparate data sources with a view to understanding possible identity matches and non-obvious relationships across multiple data silos. The identity resolution process analyzes all of the information relating to individuals and/or entities from multiple sources of data, and then applies likelihood and probability scoring to determine which identities are a match and what, if any, non-obvious relationships exist between those identities. This allows organizations to solve business problems related to recognizing the true identity of someone or something (“who is who”) and determining the potential value or danger of relationships (“who knows who”) among customers, employees, vendors, and other external forces. It also provides immediate and actionable information to help prevent threat, fraud, abuse, and collusion in all industries.
When a record corresponding to a particular person is fed as the input to this entity resolution engine, a candidate list of entities which may possibly be connected to the person depicted in the incoming record is generated. After the list of candidates is generated, each of these candidates are checked for any kind of relationships that may exist between them and the incoming record. Subsequently entity resolution (who is who) and relationship resolution (who knows who) will be done.
Candidate lists are the lists of entities that have the potential to match the incoming identity record. The candidate list is built by retrieving those entities that share attributes with the incoming identity, based on the attributes that are specified in the candidate builder configuration. The current scheme of candidate list generation processes the input records one-by-one. For each and every record, to generate the candidate list, the database has to be queried each time depending on the attributes of the incoming record. This constant querying of the database affects the performance of the system. Thus, if there are ‘N’ input records which are being fed into the engine, the task of querying the database has to be done ‘N’ times.
After the candidate list is generated, the entity resolution process compares the incoming identity to the first candidate on the list using the configured resolution rules. The system uses the resolution rules, in order, to compute a resolution score that represents how closely the incoming identity attributes match the attributes of the candidate entity. If the incoming identity attributes meet or exceed the resolution score for that rule, the incoming identity record is resolved into the candidate entity.
If the resolution score does not meet or exceed the resolution score set for that resolution rule, the system goes to the next resolution rule until the incoming identity record has been resolved into a candidate entity or all resolution rules have been exhausted. If the incoming identity record is not resolved into an existing entity, the system resolves the record into a new entity and stores the new entity in the entity database. After the entity resolution has been performed, the results have to be logged to the database each time again one-by-one.