The present invention relates to data deduplication methods, and more particularly, automatic computation of column weights for increased effectiveness of entry/record matching in data deduplication processes.
Data deduplication is a technique for finding and eliminating duplicate data records in a large data set. “Duplicate data records” may include records that are not fully identical but still represent the same entity (e.g. the same customer, client, etc.). For example, a company may have duplicate data records in a customer database if the same customer registered multiple times using slightly different data (e.g., different email addresses, different phone numbers, different mailing addresses, etc.). Deduplication processes allow for removal of duplicates or the merging of duplicative records such that each unique entity is only represented once. Effective deduplication processes are highly useful in ensuring that various computations, analyses, and/or representations are not inappropriately skewed by duplicate records (or by removal of non-duplicative records).