Time series data refer to a series of data collected with time. In general, time series data may be acquired from a plurality of data monitoring points (shortly referred to as “data points”) based on a predetermined time interval. Therefore, time series data are associated with both time information and data points. A typical example of time series data is meter data in power grid. These data are measured by meters distributed within a given geographical area (e.g., street, community, city, etc.) and periodically stored in a central database such as an electric power company.
Compared with other types of data, the time series data increase very fast. Each data point will continuously produce data with time, such that the total amount of data to be stored will increase dramatically. Additionally, since the time series data are associated with the data point and time, information on the data point and the time are both required for the access to such data. Due to these properties, the traditional relational database would not be suitable for storing time series data. For example, due to the ACID (atomicity, consistency, isolation, durability) requirements of the relational database, a large scale of time series data would cause concurrent issues (e.g., dead lock), frequent SQL operations (e.g., thousands of times per second), and other problems. Therefore, the relational database cannot satisfy the needs in practice in terms of query performance.
A column-oriented database has been proposed to store time series data. For example, Hadoop database (referred to as “HBase”) is a known column-oriented database, which will be discussed hereinafter as an example. In the HBase, columns in a data table are classified into column families. Each column family may include one or more columns. When data constantly increases, the size of the data table also increases accordingly. At this point, the data in the data table are partitioned into a plurality of regions to store. Each region may be managed by a corresponding object called “HRegion.” In the underlying Hadoop distributed file system (HDFS), the data in each region is stored in one or more blocks in a corresponding data node.
However, as to the time series data, HBase and other known column-oriented databases as well suffer from the deficiencies in data access performance, which will be discussed in detail hereinafter. Therefore, there is a need in the art for a solution for managing storage of time series data to support more efficient and effective access to such data.