Currently, many different types of software applications provide record matching for records stored in a relational database or other type of storage medium. Record matching has many uses. For example, through record matching, matching system objects (e.g., records) can be aggregated into enterprise objects that include all system objects that describe the same entity. For each enterprise object, a single best record (SBR) may be generated that is the best representation of an entity's information. The SBR is populated with information from all of the matching system objects that describe the entity. Each enterprise object, which corresponds to a single entity, is assigned an enterprise unique identifier (EUID). An EUID is a linked identifier that links all system objects that describe a given entity (e.g., an SBR). EUIDs are stored a master index database that stores SBRs, which include the EUID linked identifier. In this manner the master index database can provide a single view of data from multiple applications.
Matching applications typically use probabilistic matching algorithms to match and link records that describe a single entity. Using such probabilistic algorithms, determining whether two records match requires computing match weights of some designated match fields in a record and adding weights for all such fields. The weights are then compared with a designated match threshold weight to determine whether the records belong to the same overall profile. For example, suppose the user-defined match threshold weight is 40, and the following record pair is matched:
Record 1: John Smith Jan. 1, 2007 Los Angles Calif.
Record 2: Joe Smit Jan. 1, 2007 Los Angeles Calif.
If the match weight computed by a match engine is greater than or equal to a match weight of 40, then the above two records are classified as a match pair. Alternatively, if the match weight computed for the above two records is below a match weight of 40, then the records are classified as a non-match pair.
Another method for matching records in larger relational databases uses block record matching. A block is a set of records that have one or more fields in common, such as “SSN” or “firstName AND lastName.” With block record matching, based on the input record received by the relational database, a list of records with one or more combinations of common field values is fetched. Thus, a block of records is created, which reduces the number of record pairs that need to be matched with one another. Each record from the block of records can then be sent to the matching engine, where the matching engine compares multiple field values from each record pair in the block of records to determine whether a match to the input record exists.
To improve the performance of record matching operations, bulk matching and loading systems may distribute performance of matching operations among multiple slave computing devices. The distribution of matching operations is performed on a block basis such that each block of records is processed by a single slave computing device in parallel with other blocks of records being processed by other slave computing devices. Since each slave computing device performs matching on complete blocks of records, the matching can be performed independent of other slave computing devices.