A database is a collection of information arranged in an organized manner. A typical database might include medical, financial or accounting information, demographics and market survey data, bibliographic or archival data, personnel and organizational information, public governmental records, private business or customer data such as addresses and phone numbers, etc.
Such information is usually contained in computer files arranged in a pre-selected database format, and the data contents within them can be maintained for convenient access on magnetic media, both for storage and for updating the file contents as needed.
Poor data quality can have undesirable implications for the effectiveness of a business or other organization or entity. For example, in healthcare, where incorrect information about patients in an Electronic Health Record (EHR) may lead to wrong treatments and prescriptions, ensuring the accuracy of database entries is of prime importance.
A large variety of computational procedures for cleaning or repairing erroneous or duplicate entries in databases have been proposed. Typically, such procedures can automatically or semi-automatically identify errors and, when possible, correct them. Typically, however, these approaches have several limitations relating to the introduction of new database errors as a result of changes that have been made. For example, a repair in order correct a functional dependency problem may lead to duplication errors. Similarly, deduplication can lead to functional dependency violations within a database.