Information continues to accumulate in electronic data space as data acquisition techniques improve and become more pervasive. Indeed, data acquisition systems are so ubiquitous, that similar or substantially similar data may be collected many times over across a particular demographic segment. For example, in retail sales, data acquired from a customer may be entered at any number of retail outlets from which the customer wishes to receive mailings, announcements, or advertising materials. In each of the retail outlets, the customer may enter similar or substantially similar data. Each retail outlet, in turn, may utilize a unique data acquisition system for gathering data. As may be appreciated, in combining data from each data acquisition system into a single system may result in duplicative entries, which may ultimately adversely affect data handling performance.
In one example, in the extract, transform, and load (ETL) space, data is extracted from a data source, transformed in accordance with a desired business objective, and loaded into a data warehouse. Most data warehousing projects consolidate data from different sources. Typically, during extraction, data integrity is checked against an expected pattern or structure. If the pattern or structure does not match, the data may be rejected. Unfortunately, beyond routine data integrity checks, many data acquisition systems are not configured for identifying duplicative data upon extraction. It may be appreciated that removing duplicative data before loading into a data warehouse may achieve some processing efficiencies.