With the development of technologies such as Radio Frequency Identification (RFID) and Global Positioning System (GPS), Internet of Things (IOT) has been rapidly and widely applied. In the IOT environment, millions of detected objects periodically generate data, resulting in bottleneck in scalability for traditional relational databases due to insufficient system throughput.
A cloud data management system, which has a high scalability and supports high concurrency, is an effective solution for the IOT data management. The cloud data management system is capable of efficient point query and range query in a rowkey, while a full-table scan is required for a non-rowkey query. Although the query efficiency can be improved by utilizing the Map-Reduce technique, the performance is poor for a query having a low selection rate.
Typically, the IOT data has a multi-dimensional characteristic. In addition to information in time and space dimensions, the IOT data contains information in a number of other dimensions. Moreover, a query for the IOT data is generally a multi-dimensional query based on time and space. Thus, in an IOT application environment, it is desired to provide an efficient multi-dimensional query in addition to a rapid signal dimensional query.
Another characteristic of the IOT data is that it is frequently updated. In the IOT environment, monitored objects typically generate new data periodically at a particular time interval and the data is frequently updated. In particular, in a case of a large number of monitored objects and a high data sampling frequency, the data concurrency is very high, which imposes a requirement for high throughput of the data management system.
Currently, a multi-dimensional data indexing scheme in a cloud system, RT-CAN (R-tree based index in content addressable network), has been proposed, which is dedicated for indexing and querying multi-dimensional data. The RT-CAN indexing scheme is a double layer indexing scheme based on R-tree and supports point query and range query for a number of attributes. At each storage node, an R-tree index is created for local data and then some nodes are selected from each local index based on a particular index node selection policy and distributed to a local index. In order to improve the querying speed and ensure the scalability of the system, a coverage network, CAN network, supporting multi-dimensional query is adopted in the global index and an adaptive adjustment policy based on cost model is used for index node selection.
In particular, the RT-CAN indexing scheme generally includes the following operations: 1) upon receiving a new data insertion request from a client, finding a corresponding storage node via a cloud storage system interface and storing the data in the node; 2) updating a local index in a R-tree structure created at the storage node; and 3) synchronizing the updated local index to a global index.
However, in order to maintain the balance of the tree, during the process of data insertion, the RT-CAN scheme needs to constantly perform a splitting adjustment for an index node (each data insertion will trigger an update of the local index, which in turn affects the global index). Thus, the cost for index maintenance, especially for applications with frequent data insertions, is so high that the system throughput will be significantly degraded. Therefore, this scheme is not applicable to the IOT.
Accordingly, there is a need for a data indexing system and method applicable to the IOT and capable of achieving high throughput operations on the multi-dimensional data which is frequently updated in the IOT.