A database is a collection of information arranged in an organized manner. A typical database might include medical, financial or accounting information, demographics and market survey data, bibliographic or archival data, personnel and organizational information, public governmental records, private business or customer data such as addresses and phone numbers, etc.
Such information is usually contained in computer files arranged in a pre-selected database format, and the data contents within them can be maintained for convenient access on magnetic media, both for storage and for updating the file contents as needed.
Poor data quality can have undesirable implications for the effectiveness of a business or other organization or entity. For example, in healthcare, where incorrect information about patients in an Electronic Health Record (EHR) may lead to wrong treatments and prescriptions, ensuring the accuracy of database entries is of prime importance.
A large variety of computational procedures for cleaning or repairing erroneous entries in databases have been proposed. Typically, such procedures can automatically or semi-automatically identify errors and, when possible, correct them. For example, one approach for repairing so-called dirty databases is to use data quality rules in the form of database constraints to identify records with errors and inconsistencies and then use these rules to derive updates to these records. Most of the existing data repair approaches focus on providing fully automated solutions using different heuristics to select updates that would introduce minimal changes to the data.