The present invention relates to anomaly data detection, and more particularly, to a method and an apparatus for detecting abnormal subsequences in a data sequence.
In scenarios such as Internet of Things (IOT), smarter planet or the like, data may be generated constantly over time via a certain data generation mechanism, and thereby form a time data sequence. For example, in a scenario where atmospheric pollutants are detected by using a detector, the detector outputs data constantly over time to thereby form a time data sequence that reflects atmospheric pollution levels at respective moments. In the time data sequence, there might be some data that greatly deviate from the other data, and such data may be called abnormal data. Because the abnormal data can reflect some problem existing in the data generation mechanism or some important states of object associated with the data, it is very important to detect the abnormal data in the time data sequence.
Currently, many methods have been proposed to detect an abnormal data in a time data sequence. In these conventional methods, when it is detected whether some data in the time data sequence are abnormal, all data in the sequence have to be used, and the entire time data sequence has to be scanned for many times during the detection, which results in a huge computation amount and long time for the detection operation. In addition, distribution densities of all the data of the time data sequence (especially the time data sequence generated in a long time period) in a mapping space often differ greatly, therefore if it is detected whether some data are abnormal on the basis of the distribution densities of all the data, normal data which differ greatly from other data in the distribution density may be identified as abnormal data, rendering an inaccurate result. Moreover, the conventional methods can only conduct an off-line (non-real time) detection, rather than an online (real time) detection, on the time data sequence, which is not acceptable for some scenarios where a detection result is expected to be obtained as soon as possible.