The present invention relates generally to the field of information storage and retrieval using databases and, more specifically, to identity resolution for entities which may have more than one identifier.
Database systems are widely used to keep information about real (as opposed to virtual, e.g., information in a computer) world objects—such as an individual person, groups of people, organizations, and businesses, for example—organized in such a way that information about the object becomes readily accessible. Of fundamental importance for such a function is the ability to identify each object about which information is to be kept and accessed in a well-defined way so that each object has an identity.
The word “identity” may be defined as an alias for a real-world object (referred to as a “warm body”) that is typically specified by the warm body. For example, an identity may be a name, social security number, driver's license number, or Medicaid number, or so forth.
The word “account” may be defined as a collection of identities and other information about a single warm body. By definition, a single account represents one, only one, and always the same, warm body. For example, a single account might contain identities such as names, a social security number, and a driver's license number, as well as other information such as hair color, age, and height. Accounts are typically specified by the user of the database system. Some examples of accounts are credit card accounts, bank accounts, and airline passenger accounts.
The word “entity” may be defined as a collection of one or more accounts. Entities are typically specified by the database system, and may be loosely characterized as the system's attempt at representing a warm body. A “warm body” may be defined as a physical thing in the real world (often a human being, for example, but possibly any object about which information may be kept such as an aircraft, a vehicle, or a corporation) that typically has multiple identities, multiple accounts, and multiple entities.
One basic function of an identity resolution system is to represent each distinct warm body as a unique entity. For example, if an identity resolution system has information that a first entity and a second entity represent the same warm body, the identity resolution system may “resolve” the two separate entities into a single entity. Conversely, if a single entity in an identity resolution system has information that appears to belong to two separate warm bodies, the identity resolution system may attempt to “un-resolve” the single entity into two separate entities matched to the two distinct warm bodies.
One of the primary purposes of an identity resolution system is to resolve seemingly disparate accounts together. That is, the system may currently be under the assumption that two accounts represent two distinct warm bodies. As more information comes into the system the identity resolution system may detect “enough” similarities between those two accounts that the system decides that those two accounts actually represent the same warm body—in which case the system “resolves” those two accounts. When such a resolution decision is correct the system is functioning as desired. All the accounts that are currently known by the system to represent the same warm body are then held in a single entity. In the ideal situation, there would be a single entity in the system for each unique warm body that has accounts in the system. Initially, however, the system usually has multiple entities per warm body. The following relationship holds by definition in the system:
number of accounts≧number of entities≧number of warm bodies.
In general, as time progresses and more information enters the system for accounts, the number of entities in the system moves away from the number of accounts in the system and converges down towards the number of warm bodies attached to those accounts.
Certain types of problems, however, are generally encountered. For example, the system may incorrectly resolve two entities, meaning the system incorrectly thinks (e.g., maintains information) that two distinct warm bodies are the same warm body—referred to as the “incorrect resolve problem”. There is also, for example, an “incorrect unresolve problem”, in which two accounts that have been correctly resolved as referring to the same warm body are at some point incorrectly unresolved, i.e., the system incorrectly enters a state in which the system thinks the two accounts represent two distinct warm bodies.
There are several known problems related to keeping track of entities and their identities as the entities are resolved and unresolved in an identity resolution system. One problem may be referred to as the “lost entity identifier problem” in which, after two entities are resolved into a single entity, the single entity may not be identifiable by one or another of its previous identifiers so that a user of the system, when searching using the previous identifiers, does not find the (new) single entity, which now appears lost to the user.
Another problem may be referred to as the “lost entity version problem”. For example, the entity of interest may be still identifiable after a resolution, yet that entity may have changed enough, e.g., through addition/deletion of accounts, that, although the previous version fit a context in which a user wanted to look at the entity, the structure of the present version no longer makes sense in the context in which the user is looking at the entity.
Another problem may be referred to as the “entity switched warm bodies/accounts problem”. For example, the system initially associates one entity identity to one warm body, and then, after a series of resolves and unresolves, uses that same entity identity to refer to a completely different warm body, so that a user—with an expectation that entity identities should be the same as warm body identities—of the system may become confused.
Another problem may be referred to as the “account drift problem”. For example, during a long sequence of resolves and unresolves, a single account may show up, by itself, in many different entities and each of those entities may have a different identity than all of the others. Under such circumstances, the account appears to “drift” from one entity to another. The key feature of this problem is that the system looks like it's superficially re-inventing entity identities for the same account/warm body—over and over again. Thus, the account appears to be drifting, by itself, from one entity to another so that a user of the system may become confused.
Notwithstanding the current techniques, there is a need in the art for entity tracking and identity resolution for entities that may have more than one identifier, which provide solutions for a number of problems encountered in the art—such as the “lost entity identifier problem”, the “lost entity version problem”, the “entity switched warm bodies/accounts problem”, and the “account drift problem”.