Various embodiments of the present invention relate to database management, and more specifically, to a method and apparatus for searching in a database.
With the development of technologies such as computer, data communication, real-time monitoring or the like, time series databases have been applied in various aspects such as equipment monitoring, production line management, financial analysis and so on. A time series refers to a set of measured values arranged in chronological order, and in the context of the present invention a node storing measured values may be referred to as a data point or data event. A time series database refers to a database for storing these measured values. Measured values may include various data. For example, in an application environment of monitoring bridge security, collected data may include pressure data and/or pressure intensity data collected by a certain sensor; in an application environment of weather forecasting, collected data may include temperature, humidity, pressure, wind force (e.g., including direction and magnitude), etc.
Similarity search is an important operation in time series databases. Specifically, similarity search refers to searching in a time series for similar subsequences to target subsequences. In the technical field of time series databases, a similar subsequence may be referred to as pattern for short. Typically a time series consists of massive data, and the time series might be continuously updated in real time by incoming measured values. For example, in the application environment of monitoring bridge security, tens of thousands of sensors might be deployed on the bridge to measure pressure at each location in real time. When the database is updated with a frequency of 1 second or even higher, very large amounts data will be generated.
Note similarity search does not require found patterns completely match with the target subsequence but may contain some difference. For example, an error bound may be e. A subsequence found by similarity search has the same length as a target subsequence (for example, the length equals m), and has a difference from the target subsequence that is less than or equal to e.
Since the time series database has a complex storage structure, the efficiency of searching in the time series database is rather unsatisfactory. In addition, since the time series database may store different types of data, a search mode that is suitable for one data type is not necessarily suitable for other data type. Therefore, how to increase the efficiency of searching in time series databases becomes a research focus. Further, currently the database system gets increasingly complex, and both a time series database and a relational database may be included. Therefore, how to search in a database system across different types of databases also becomes a research focus.