1. Field of the Invention
Embodiments of the invention generally relate to processing identity records in an entity resolution system, and more particularly, to grouping similar values of an entity attribute type and determining the best value of an entity attribute type in an entity resolution system.
2. Description of the Related Art
In an entity resolution system, identity records are loaded and resolved against known identities to derive a network of entities and relationships between entities. An “entity” generally refers to an organizational unit used to store identity records that are resolved at a “zero-degree relationship.” That is, each identity record associated with a given entity is believed to describe the same person, place, or thing (e.g., the identity of a employee represented as an employee record from an employee database entity-resolved with the identity of a property owner from the county assessor's public records). Thus, one entity may reference multiple individual identities with potentially different values for various attributes. This is frequently benign, e.g., in a case where an entity includes two identities with different names, a first being an identity record identifying a woman based on a familial surname and a second identity record identifying the same woman based on a married surname. Of course, in other cases, differing attribute values between identities in the same entity may be an indication of mischief or a problem, e.g., in a case where one individual is impersonating another, using a fictitious identify, or engaging in some form of identify theft. The entity resolution system may link entities to one another by relationships. For example, a first entity may have a first degree relationship with a second entity based on identity records (in one entity, the other, or both) that indicate the individuals represented by these two entities are married to one another, reside at the same address, or share some other common information.
One task performed by an entity resolution system is to generate alerts when the existence of a particular identity record (typically the inbound record being processed) causes some condition to be satisfied that is relevant in some way and that may require additional scrutiny by an analyst. For example, the entity resolution system may generate a list of alerts about identities or entities that should be examined by an analyst. Relevance detection may be used to identify potential threats and fraud as well as potential opportunity. For example, if a person has more than three distinct first names or more than one social security number, then a fraud alert may be generated.
In entity resolution systems, a single entity may have multiple attribute values for the same attribute type. Frequently, this may result from multiple records being provided that include a value for a given attribute. For example, an entity may have multiple addresses, phone numbers, driver's license numbers, names, etc. In some cases, different values for an attribute may be appropriate (e.g., when a person changes telephone numbers or moves from one place to another). Multiple attribute values may also exist due to the variety of systems from which identity records are drawn. Moreover, different record systems may introduce typos, transpose characters, or make system-specific alterations, such as truncating an address.