Currently, many different types of software applications provide record matching for records stored in a relational database or other types of storage mediums. Such record matching capabilities are typically useful, for example, for businesses that provide services to customers and need to keep track of information, such as customer personal information, purchase price, quantity purchased, etc. Record matching may have several uses. For example, by performing record matching, records that contain incomplete data can be completed using information from a matching record which includes the missing data. Further, matching records can be aggregated into a single profile to store information more efficiently. To perform recording matching, a framework is typically provided by a business or service entity to create and maintain matching and indexing applications, known as master indices.
Matching applications typically use probabilistic matching algorithms to match and link data. Using such probabilistic algorithms, determining whether two records match requires computing match weights of some designated match fields in a record and adding weights for all such fields. The weights are then compared with a designated match threshold weight to determine whether the records belong to the same overall profile. For example, suppose the user-defined match threshold weight is 40, and the following record pair is matched:
Record1: John Smith Jan. 1, 2007 Los Angles Calif.
Record1: Joe Smit Jan. 1, 2007 Los Angeles Calif.
If the match weight computed by a match engine is greater than or equal to a match weight of 40, then the above two records are classified as a match pair. Alternatively, if the match weight computed for the above two records is below a match weight of 40, then the records are classified as a non-match pair.
Another method for matching records in larger relational databases uses block record matching. A block is typically defined as a set of records that have one or more fields in common, such as “SSN” or “firstName AND lastName.” With block record matching, based on the input record received by the relational database, a list of records with one or more combinations of common field values is fetched. Thus, a block of records is created, which reduces the number of record pairs that need to be matched with one another. Each record from the block of records can then be sent to the matching engine, where the matching engine compares multiple field values from each record pair in the block of records to determine whether a match to the input record exists.
With either block record matching or individual record matching, typically, each time an input record (i.e., a new record) is received by the database or storage medium, a database query is issued to match the new input record to one or more stored records in the database. As database size grows exponentially, this can cause delays in matching a growing number of input records.