Data quality is an assessment of the suitability of data to serve its purpose in a given context. Data quality generally pertains to various aspects of the data, which are indicative of the suitability of data, such as accuracy, completeness, update status, relevance, consistency of data across various data sources, reliability, appropriate presentation, and accessibility by various stakeholders. Usually data quality measures include standardizing source data fields; ensuring consistency in the data; validating, certifying, and enriching common data elements; and using trusted data sources.
Other conventional techniques of enhancing data quality include analyzing, and identifying improved standardization, validation, and matching processes; comparing data across or within data sources to check consistency of data; ensuring removal of duplicate data; and developing relationships among common entities from different sources, for example by creating foreign key relationships.
However, in spite of measures implemented to ensure data quality, over time certain errors, inconsistency, and inaccuracy may creep into the data. The cause of degradation of data quality may be caused by various data quality problems, such as data entry errors, limited validation of data at the time of entry, system field limitations, mergers and migrations, data repository migrations across various database management systems vendors, inconsistent standards, discrepancies in data format, difference in structure of data repositories, missing data, data fields filled with default values or nulls, spelling errors and data anomalies.
Poor data quality may adversely impact the functioning of the organization. For example the organization having a poor data quality may suffer from losses arising from extra costs to prepare reconciliations, delay or scrapping migration to a new system, failure to bill or collect receivables, inability to deliver orders, failure to meet contracts and so on.