The problem of detecting changes in low-dimensional sequential data has been studied by statisticians for more than fifty years. Methods of change detection first appeared in the 1940s based on Wald's sequential analysis [Wald, 1947], in particular the sequential probability sequential test (SPRT) [Basseville and Nikiforov, 1993], and later, Page introduced the cumulative sum method [Page, 1954]. Recently, the machine learning and the data mining communities become interested in the change detection problem due to the need to discover changes in data, such as customer click streams, high-dimensional multimedia data, and retail chain transactions, generated from online processes that are not stationary [Domingos and Hulten, 2001]. The target concepts change over time. It is, hence, vital to detect the changes in the data generating processes so that timely decisions can be made.
One real-world problem that requires detecting changes is the video segmentation problem which corresponds to video-shot change or video break detection. Many algorithms [Gargi et al., 2000; Lefevre et al., 2003; Zhai and Shah, 2005] have been proposed to perform video-shot change detection. The range of existing methods includes pixel and histogram-based difference methods and motion-based methods (e.g. optical flow). Threshold selection, a critical step for successful change detection, is required by methods using global or local thresholds. For video sequences with clear and distinct shots, a single global threshold would be sufficient. For video sequences that have both abrupt and gradual changes between shots, however, a global threshold may not be found. To address such concerns [Gargi et al., 2000] argued for the use of local thresholds. The use of local thresholds requires choosing appropriate window size. Alternatively, Zhai and Shah [Zhai and Shah, 2005] proposed that video breaks should be detected using the deviation from some current model.
What is needed is a new technique to detect changes in high-dimensional streams of both labeled and unlabeled data.