Companies or other organizations often gather data into data repositories, such as databases or data warehouses, for analysis to discover hidden data attributes, trends, patterns, or other characteristics. Such analysis is referred to as data mining, which is performed by companies or other organizations for planning purposes, for better understanding of customer behavior, or for other purposes.
It is often useful to detect for a “structural” or “systematic” change in observed data from a particular data source or database. A “systematic” or “structural” change in data results from some change in a particular system that produced the data, where such change results from an underlying change in the system rather than from changes due to normal operation of the system. The term “systematic change” is often used in the industry context, whereas the term “structural change” is often used in the economics context. In this description, the terms “systematic change” and “structural change” are interchangeably used and refer to any change in data that results from a change in the system that produced the data.
Detecting a systematic change of data involves change-point detection, which identifies the point in time of the change. Conventionally, change-point detection has employed a model that assumes a constant mean before the change, a constant mean of a possibly different value after the change, and a constant variance for the observed data values. A shift in the calculated constant means or constant variance has conventionally been used as an indication that a systematic change has occurred.
The assumption of constant means and constant variance is typically inapplicable to data that exhibits non-linear trends, seasonal effects, and heteroscedasticity (varying variability over time). Data exhibiting such characteristics are typically produced by systems that are dynamically changing or that exhibit non-linear changes due to varying underlying business cycles, business trends, or other factors. For data exhibiting non-linear trends, seasonal effects, and heteroscedasticity, change-point detection based on the calculation of constant means or constant variance would typically not provide accurate results.