Data streams have recently received much attention in several communities (e.g., theory, databases, networks, data mining) because of several important applications (e.g., network traffic analysis, moving object tracking, financial data analysis, sensor monitoring, environmental monitoring, scientific data processing). Many recent efforts concentrate on summarization and pattern discovery in time series data streams. Some of these recent efforts are further described in (Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002; T. Palpanas, M. Vlachos, E. J. Keogh, D. Gunopulos, and W. Truppel. Online amnesic approximation of streaming time series. In ICDE, 2004; S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, unsupervised stream mining. VLDB J., 13(3), 2004; M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, 2004; K. Chakrabarti, E. Keogh, S. Mehotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. TODS, 27(2), 2002; P. Patel, E. Keogh, J. Lin, and S. Lonardi. Mining motifs in massive time series databases. In ICDM, 2002; B. Chiu, E. Keogh, and S. Lonardi. Probabilistic discovery of time series motifs. In KDD, 2003).
Typical approaches for pattern discovery and summarization of time series rely on fixed transforms, with a predetermined set of bases or approximating functions, as described in (S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, unsupervised stream mining. VLDB J., 13(3), 2004; M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, 2004, Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002; T. Palpanas, M. Vlachos, E. J. Keogh, D. Gunopulos, and W. Truppel. Online amnesic approximation of streaming time series. In ICDE, 2004, and K. Chakrabarti, E. Keogh, S. Mehotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. TODS, 27(2), 2002). For example, the short-window Fourier transform uses translated sine waves of fixed length, and has been successful in speech processing, as is further described in (M. R. Portnoff. Short-time Fourier analysis of sampled speech. IEEE Trans. ASSP, 29(3), 1981). Wavelets use translated and dilated sine-like waves and have been successfully applied to more bursty data, such as images and video streams However, these approaches assume a fixed-length, sliding window. For example, short-window Fourier cannot reveal anything about periods larger than the sliding window length. Wavelets are by nature multi-scale, but they still use a fixed set of bases, which is also often hard to choose.
In time series stream methods, the work described in “A multiresolution symbolic representation of time series” by Megalooikonomou, Wang, Li, and Faloutsos, in ICDE 2005: 668-679 produces a single representative for a set of scales, using vector quantization within each scale. Its main focus is on finding good-quality and intuitive distance measures for indexing and similarity search. However, this approach does not produce a window size. The window sizes are chosen a priori. Also, this approach it is not applicable to streams, it is severely restricted in the type of approximation (each window is approximated by a discrete value, based on the vector quantization output) and hence the method cannot be composed so the next level reuses the approximations of the previous level.
The work described in “A data compression technique for sensor networks with dynamic bandwidth allocation” by Lin, Gunopulos, Kalogeraki, and Lonardi, in TIME 2005: 186-188 also uses vector quantization in order to reduce power consumption for wireless sensor networks. This approach only examines a single, a priori chosen window size.
The work in “Knowledge discovery from heterogeneous dynamic systems using change-point correlations” by Idé and Inoue, in SDM 2005: 571-576) employs a similar technique for change point detection. The change point scores are then used to correlate complex time series. This approach examines only a single, a priori chosen window size, and the computation required is too costly to be feasible in a streaming environment.
Therefore a need exists to overcome the problems with the prior art as discussed above.