Companies or other organizations often gather data into data repositories, such as databases or data warehouses, for analysis to discover hidden data attributes, trends, patterns, or other characteristics. Such analysis is referred to as data mining, which is performed by companies or other organizations for planning purposes, for better understanding of customer behavior, or for other purposes.
A database can receive an input data stream from one or more data sources for collection and storage. As data is received, it is sometimes desirable to analyze data values to detect for validity of the data values. For example, an organization may desire to identify invalid or erroneous data values on a more or less real-time basis, such that the organization can act quickly upon detection of invalid or erroneous data values. The ability to act on data values that are invalid or erroneous allows an organization to more quickly identify problem areas such that the errors do not accumulate. The process of identifying errors in data values is referred to as a data quality assurance procedure.
Data quality assurance of data produced by a dynamically changing system is usually difficult to perform accurately. A dynamically changing system is a system that produces data that exhibits non-linear trends, seasonal effects, and heteroscedasticity (varying variability over time). For such dynamically changing systems, detecting for a change in the data that is indicative of a problem in the input data based on calculations of constant means and constant variance typically does not produce accurate results.