The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Used mainly in databases, data cleaning is the process of identifying incomplete and/or incorrect parts of the data and then replacing, modifying, or deleting this “dirty” data. The actual process of data cleaning may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict, such as rejecting any address that does not have a valid postal code, or fuzzy, such as correcting records that partially match existing, known records. Some data cleaning solutions will clean data by cross checking with a validated data set. Data enhancement, where data is made more complete by adding related information, is a common data cleaning practice, such as appending addresses with phone numbers related to that address. In the business world, incorrect data can be costly. Many companies use customer information databases that record data like contact information, addresses, and preferences. For instance, if customer addresses are inconsistent, the company will suffer the cost of resending mail or even losing customers. Much of an average company's customer contact data goes bad annually. Accordingly, it is desirable to provide techniques that enable a database system to clean data in a customer relationship management system.