Embodiments of the present invention relate to database management, and more specifically, to a method and apparatus for managing time series databases.
With the development of computer, data communication and real-time monitoring technologies, time series databases have been widely applied to many aspects such as device monitoring, production line management and financial analysis. A time sequence refers to a set of measured values that are arranged in temporal order, and a node where a measured value is stored can be called a data point or a data event. A time series database refers to a database for storing these measured values. Measured values may comprise various kinds of data. For example, in an application environment of monitoring bridge security, data being collected may comprise pressure data and/or intensity of pressure data collected by certain sensors; in an application environment of weather forecast, data being collected may comprise temperature, humidity, pressure, wind force (e.g., including magnitude and direction), etc.
Similarity search refers to finding in a time series database a sequence that is similar to a given sequence pattern. A time series database usually comprises massive data and is continuously updated in real time by recent measured values. For example, in an application environment of monitoring bridge security, thousands of sensors might be deployed on the bridge for measuring, in real time, temperature, humidity, pressure and wind force. When a database is updated with a frequency of 1 second or even higher frequencies, a huge amount of data will be produced. Therefore, how to conduct a similarity search in a time series database with a rapidly growing amount of data has currently become one of research focuses in the database field.
Technical solutions for accelerating similarity search have been developed so far. These technical solutions propose to first return a candidate set and then verify candidates in the candidate set in a time series database, thereby reducing query time. However, a candidate set usually consists of many candidates, and to verify candidates one by one will produce huge data I/O overheads and occupy a considerable time.
With the wide application of time series databases in various industries, databases' providers, managers and end users pay more and more attention to the efficiency of database queries. Therefore, it becomes a pressing issue as to how to further reduce overheads of various resources in similarity search.