1. Technical Field
The present disclosure relates to data processing, and in particular, to record matching.
2. Description of the Related Art
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Record matching, also referred to as “record linkage” or “special purpose grouping”, generally relates to the task of finding records in a data set that refer to the same entity. These records may come from different data sources (e.g., data files, books, websites, databases, etc.) or may be variations within a data source (e.g., different data entry protocols, etc.). For example, does the record “John Smith at 555 Water Street” match “J Smith at 555 Water St.”? Often the answer to that question will vary depending upon the particular use for the records. For example, if a company is sending refund checks, the company will want to see those two records as a possible match in order to avoid sending a double refund. Alternatively, if the company is performing a census, the company will want to see those two records separately (i.e., not as a possible match) in order to verify the census data. Another example is householding; for example, a company that wants to deliver one catalog to a particular address, even if its records show more than one customer at that address.
Record matching may be a computationally intensive problem. For example, some record matching techniques perform record matching by comparing each record to each other record. The computational effort involved in such matching may be on the order of quadratic complexity (e.g., θ(n^2) time, where n is the number of records in the database).