1. Technical Field
The present invention relates to database systems, and more specifically, to cleansing data within a database system by measuring the quality of data to identify low quality data within a collection of data records and adjusting the cleansing process based on the identified low quality data.
2. Discussion of the Related Art
Data quality refers to characteristics of data that render the data appropriate for a specific use (e.g., the state of completeness, validity, consistency, integrity, timeliness, accuracy, etc.). Data characteristics indicating low or poor data quality include incomplete data, wrong data, and inconsistent data. Data quality is a major issue for large database systems. For example, when data sizes include several million records, even a low percentage (i.e., one percent) of low quality data may result in a few hundred thousand erroneous records. The low quality data causes significant economical losses since these data quality issues involve costly steps to correct.