1. Field
Embodiments of the invention relate to using space and time for entity resolution.
2. Description of the Related Art
Entity resolution techniques may be used to determine when two or more entities (e.g., people, buildings, cars, things, other objects, etc.) represent the same physical entity despite having been described differently. Sometimes these techniques are called deduplication, match/merge, identity resolution, semantic reconciliation, or have other names. For example, a first record containing CustID #1 [Bob Jones at 123 Main Street with a Date of Birth (DOB) of 6/21/45] is likely to represent the same entity as a second record containing CustID #2 [Bob K Jones at 123 S. Main Street with a DOB of 6/21/1945]. Entity resolution can be used within a single data source to find duplicates, across data sources to determine how disparate transactions relate to one entity, or used both within and across a plurality of data sources at the same time.
Entities have features (values that are collected or observed that can be more or less discriminating). For example, in the area of human entities, features may include one or more of: name, address, phone, DOB, Social Security Number (SSN), Driver's License (D/L), biometric features, gender, hair color, and so on. By way of example, SSN's are generally very discriminating, dates of birth are less discriminating, and gender is not particularly discriminating at all. As another example, entity resolution on objects, such as a car, may include one or more features of: license plate number, Vehicle Identification Number (VIN), make, model, year, color, owner, and so on.
Features may be used to establish confidence (a degree of certainty that two discreetly described entities are the same). For the example of CustID #1 and CustID #2, the confirming features of name, address, and DOB and the lack of conflicting features (e.g., features in disagreement, such as opposing D/L numbers) probably result in a high enough confidence to assert that the first record and the second record represent the same entity (e.g., person), without human review.
Entity resolution systems are described further in: “Entity Resolution Systems vs. Match Merge/Merge Purge/List De-duplication Systems” by Jeff Jonas, published Sep. 25, 2007.
Now imagine if the first record and the second record were for identical twins (two separate people). Also imagine that each twin is presenting the exact same passport document (same name, same number, same DOB, etc.). Furthermore, consider the improbability of a biometric comparison (iris, fingerprint, etc.) evaluating both twins and the biometric scoring as “same” entity—whether the biometric score resulted from fraud, a faulty biometric technique, or some higher miracle. Despite absolute similarity across the traditional feature space (name, DOB, biometrics, etc.) clearly sufficient to cause an entity resolution technique to assert that the first and second records reflect a single entity, the twins are nonetheless two separate entities (i.e., two separate people).
The human process of determining when things are the same or different includes the physics principles that:
1) the same thing cannot be in two different spaces (e.g., places) at the same time; and
2) two different things cannot occupy the same space at the same time.
For example, assume a person, named Bill, is sitting across the table from a person, named Tom, and talking to Tom. Assume also that Bill was suddenly covered with a blanket and then used a device to change the nature of his voice. Obviously, Tom would not be able to observe any specific features from Bill (i.e., Tom can not see Bill's face or clothes, or hear a Bill's familiar voice, etc.). Nonetheless, Tom would still know with certainty that the person covered by a blanket is, in fact, still Bill. Tom saw Bill cover himself with the blanket, and despite the lack of available features, Tom knows it is Bill under the blanket—an assertion based on the fact two different things cannot occupy the same space at the same time.
Conventional entity resolution systems do not take into account space and time coordinates as means to improve entity resolution accuracy. The use of space and time features, is in fact, essential to advance entity resolution systems. Thus, there is a need for using space and time for entity resolution.