A database is a collection of information arranged in an organized manner. A typical database might include medical, financial or accounting information, demographics and market survey data, bibliographic or archival data, personnel and organizational information, public governmental records, private business or customer data such as addresses and phone numbers, etc.
Such information is usually contained in computer files arranged in a pre-selected database format, and the data contents within them can be maintained for convenient access on magnetic media, both for storage and for updating the file contents as needed.
Poor data quality can have undesirable implications for the effectiveness of a business or other organization or entity. For example, in healthcare, where incorrect information about patients in an Electronic Health Record (EHR) may lead to wrong treatments and prescriptions, ensuring the accuracy of database entries is of prime importance.
A large variety of computational procedures for cleaning or repairing erroneous entries in databases have been proposed. Typically, such procedures can automatically or semi-automatically identify errors and, when possible, correct them. Typically, however, these approaches have several limitations relating to the scalability of the method used, especially when repairs or updates to larger databases are desired, and in terms of the accuracy of values to be used as replacements values for determined errors.