“Data quality” is a term used to describe the completeness, correctness, or integrity of data, such as records stored in a database system. Data quality can be measured by reading the records from a database system and comparing attributes of the records to various rules that define acceptable values for the attributes. For example, rules can define value ranges for a given attribute, a data format for a given attribute, or require that an attribute not have missing data. By reading records from a database and evaluating the record attributes based on such rules, data quality can be quantified on an attribute-by-attribute basis. In addition, data quality can be quantified on an aggregate basis for a database table by evaluating the column attributes of the table. Aggregate measures of quality can also be developed for entire database systems.
However, existing techniques for measuring data quality do not provide a clear picture of how data quality changes over time. Instead, existing techniques merely provide a “snapshot” or instantaneous view of the quality of a given database at a given time. For example, in one month a database might be evaluated and have 97% of records without any missing values, and in the next month the same database may have 98% of records without any missing values. However, existing techniques do not provide information about whether these changes in the percentage of missing values reflect a trend towards improving quality of the database. Rather, existing techniques merely provide instantaneous quality measurements without providing information about changes or trends in data quality.
Thus, it is desirable to provide data quality information that reflects how data quality changes over time. It is further desirable to provide automated analyses of the data quality information that enables users to understand the causes and impacts of the measured changes in data quality.