I. Technical Field
The present invention generally relates to the field of data linking using multi-entity ontology weighting. More particularly, the invention relates to computerized systems and methods for matching a prospective set of related entities in view of other entities that are known to be related.
II. Background Information
In recent years, more and more information has been stored in electronic form. With the increase in the sheer volume of data, searching for desired information has become increasingly more difficult. For example, when searching for desired information, some traditional techniques examine data for specific alphanumeric characters. In particular, “record linkage” is a traditional searching technique that determines whether two or more data records include the same sequence of alphanumeric characters. When data records include the same entry of alphanumeric characters, the data records are considered related and are matched or “linked” together. By linked, it is meant that the data records are treated as a single record concerning the subject of the search.
Such a technique searches for a specific sequence of alphanumeric characters (e.g., a person's name) in data records. However, a name is often insufficient to uniquely identify a person because many people may share the same first and/or last names. Locating the desired name in one or more data records does not guarantee that the search has identified data records that pertain to the actual subject of the search. Consequently, such a technique often links together a large number of data records that actually do not refer to the intended subject.
Other traditional record linkage techniques evaluate data records in order to decide whether or not to link together two data records. One traditional technique considers the context of data in the data records. For example, more significance may attach to a match between two data records that include the name “Augustus” than to a match between two data records that include the name “John.” Such technique is referred to as frequency-based matching. However, the use of frequency-based matching, while generally increasing accuracy, often does not adequately match data records, particularly when searching a large volume of data. For example, frequency-based matching does not adequately resolve searches that involve more common names or terms.
Accordingly, traditional searching techniques suffer from drawbacks that limit their accuracy. In large-scale searching endeavors in which millions of data records are searched, simple record linkage and frequency-based matching are insufficient to accurately identify specific entities, such as individuals. Accordingly, there is a need for improved systems and methods for data matching that are more accurate and efficient.