With the ever increasing use of computers and the ever expanding availability of memory, more and more data is being stored, whether electronically, optically, or otherwise, for automated retrieval, processing, and presentation. Typical data may include information that allows businesses to operate.
Attaining high data quality is important to implementing successful automated operations. One aspect of data quality includes restricting the amount of duplicate data objects to a minimum because processing two or more data objects that describe the same item makes applications complicated and error prone, and, accordingly, expensive. Finding and eliminating duplicate data objects is thus quite beneficial.
One approach to finding duplicate data objects is to execute a matching algorithm between data objects. Such algorithms often return an indication of the probability that two data objects match. If two data objects probably match, one of the data objects may be deleted. If two data objects might match, an inspection may be made to determine whether the data objects match or not.