Various embodiments of the present invention relate to time series, and more specifically, to a method and apparatus for processing time series.
With the development of technologies such as computer, data communication and real-time monitoring, time series databases have been applied in various aspects such as equipment monitoring, production line management, and financial analysis. A time series refers to a set of measured values arranged in time order. Here, a node storing measured values may be referred to as a data point or data event. A time series database refers to a database for storing these measured values. Measured values may include various data. For example, in an application environment of monitoring bridge security, collected data may include pressure data and/or pressure intensity data collected by a certain sensor; in an application environment of weather forecasting, collected data may include temperature, humidity, pressure, wind force (e.g., including direction and magnitude), etc.
Similarly, the term “search” refers to searching for similar subsequences in a time series. Typically, a time series consists of massive data, and the time series might be continuously updated in real time by incoming measured values. For example, in the application environment of monitoring bridge security, tens of thousands of sensors might be deployed on the bridge to measure pressure at each location in real time. When the database is updated with a frequency of 1 second or even shorter, large amounts of data will be generated.
It should be noted that a similarity search does not require subsequences to completely match with one another but may involve some difference. For example, an error bound may be “e.” An important aspect of similarity search is to search for a motif in a time series. In short, a motif refers to a time series subsequence with a length of m that appears at least s times in the time series with the error bound e. In a time series database, motifs are an important basis for post processing (e.g., obtaining an association rule, clustering, classification, etc.).
So far there have been developed technical solutions for accelerating a similarity search. In these technical solutions, however, there exist a lot of drawbacks, as a collected time series cannot be processed in real time due to a large data amount or as only a time series in sliding window scope with a limited length can be processed in real time. Therefore, it becomes a research focus in the time series database field regarding how to search in a time series with a soaring data amount, for example, how to find top-k (a concrete value of k may be specified) motifs with the largest count of occurrences.