Companies or other organizations often gather data into data repositories, such as databases or data warehouses, for analysis to discover hidden data attributes, trends, patterns, or other characteristics. Such analysis is referred to as data mining, which is performed by companies or other organizations for planning purposes, for better understanding of customer behavior, or for other purposes.
It is often useful to detect for a “structural” or “systematic” change in observed data from a particular data source or database. A “systematic” or “structural” change in data results from some change in a particular system that produced the data, where such change results from an underlying change in the system rather than from changes due to normal operation of the system. The term “systematic change” is often used in the industry context, whereas the term “structural change” is often used in the economics context. In this description, the terms “systematic change” and “structural change” are interchangeably used and refer to any change in data that results from a change in the system that produced the data.
Detecting a systematic change of data involves change-point detection, which identifies the point in time of the change. Conventionally, change-point detection has employed a model that assumes a constant mean for observed data values before the change, a different constant mean for the observed data values after the change, and a constant variance for the observed data values. A shift in the calculated constant means or constant variance has conventionally been used as an indication that a systematic change has occurred.
Some other forms of change-point algorithms detect change points based on comparing aggregate values (computed from aggregations of data values) against a threshold. With such algorithms, a change point can be detected based on the crossing of the threshold by the aggregate values. However, it is often difficult to accurately set an optimal threshold value. An incorrectly set threshold may result in inaccurate or late detection of a change point.