It is a common goal for data processors to remove duplicate records from a database of records (e.g., customers' contact information), as duplicate records provide inaccurate information, and can result in wasted mailing costs and customer dissatisfaction.
In the past, duplicate records were uncovered using a “brute force” algorithm, where each record is compared to every other record in a database. For example, a database having ten records would require forty-five comparisons. Adding an additional record to the database would require ten additional comparisons, and adding another record would require eleven additional comparisons, and so forth. Although comparisons can be done very quickly with today's computers, the sheer number of comparisons required even for small databases (e.g., one million records) can easily exceed practical time spans. For example, in the case of one million records, one trillion comparisons would be required.
To reduce the amount of processing time required, it is known to first cluster records that share a certain attribute. For example, a database of records could be clustered by the first digit of each record's zip code, creating ten clusters. Each record in the cluster is then compared to every other record in the cluster using a “brute force” algorithm. Although this process reduces processing time, the process is incomplete because records in one cluster are not compared with records in other clusters. Thus, if a record in cluster A were to match another record in cluster B, the match would not be found.
Various other processes of detecting duplicate records are described in the art. See, e.g., U.S. Pat. No. 6,374,241 to Lamburt et al.; U.S. pat. publ. no. 2005/0273452 to Molloy, et al. (publ. December 2005); U.S. pat. Publ. no 2011/0191353 (publ. March 2011); U.S. pat. publ. no. 2012/0059853 (publ. March 2012); WIPO publ. no. 00/34897 to Bloodhound Software, Inc. (publ. June 2000); and WIPO publ. no. 2009/132263 to Lexis-Nexis Risk & Information Analytics Group, Inc. (publ. October 2009). However, all the processes known to Applicants are also incomplete and fail to appreciate that geographical proximity between records can be used to determine whether the records are duplicates.
Thus, there is still a need for efficient systems and methods that match records using geographical proximity.