1. Field of the Invention
The present invention relates to an autoregressive model learning device for time-series data and a device to detect outlier and change point using the same and particularly relates to a detection device associated with data analysis and data mining technologies that calculates the outlier score and the change point score for the data described with the discrete variate and/or continuous variate sequentially input, so as to detect the outlier and the change point with a high accuracy.
2. Description of the Related Art
Conventionally, this type of detection device that calculates the outlier score and the change point score of the time-series data for detection of the outlier and the change point uses the technologies treated in the fields of statistics, machine learning, data mining and others. In other words, abnormal value detection and change point detection, which are the functions to be realized by the present invention, have been conventionally addressed by the fields of statistics, machine learning, data mining and so on.
The present invention, however, is applied to the situation where the stationarity is not assumed for the data generation source or the information source.
Literature on the outlier detection in such a case includes the materials as shown below:
One example is a method by P. Burge and J. Shawe-Taylor called “Detecting cellular fraud using adaptive prototypes” (Proceedings of AI Approaches to Fraud Detection and Risk Management, pp: 9-13, 1997).
Another example is a method by K. Yamanishi titles “On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms (Proc. of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp: 320-324, 2000).
Still another example is a method by U. Murad and G. Pinkas called “Unsupervised profiling for identifying superimposed fraud” (Proceedings of 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, pp: 251-261, 1999).
These materials use the adaptive outlier detection algorithm to handle the non-stationarity.
Further, according to a known ordinary method to detect the change point in statistics, the number of change points in the given data is decided in advance and a model is applied considering that the data among change points can be described by a stationary model. Such a method is described, for example, in the following literature.
An example is a paper by B. Guthery titled “Partition regression” in Journal of American Statistical Association┘ (69:945-947, 1974) or a paper by M. Huskova “Nonparametric procedures for detecting a change in simple linear regression models” in the book titled “Applied Change Point Problems in Statistics” (Nova Science Publishers, Inc, 1995).
For detection of the change point in data mining, a method by V. Guralnik and J. Srivastava is described in “Event detection from time series data” (Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp: 32-42, 1999).
The conventional methods and devices according to the above literature have drawbacks as follows as a device to detect outliers and change points from the time-series data.
In the outlier detection method that can be sequentially processed by the conventional machine learning technology such as the method by P. Burge and J. Shawe-Taylor, the method by K. Yamanishi et al., or the method by U. Murad and G. Pinkas as described above, any statistic model suitable for time-series data is not used. Therefore, there is a drawback that the characteristics of the data having time-series nature cannot be grasped sufficiently. The statistic model suitable for time-series data here means a model that can express correlation among data at different timings. For example, the autoregressive model and Markov model are such type of models.
In addition, the conventional change point detection method described in the paper by V. Guralnik and J. Srivastava basically uses collective processing of data or so-called batch processing and cannot process the data sequentially. Further, the conventional change point detection methods as described above are designed on the assumption that the data are locally stationary, but such assumption is not appropriate in the reality and should be removed.
Further, though it is preferable to handle the outliers and the change points together and detect each of them in application of data mining or the like, schemes to handle them together only has been known so far.