1. Field of the Invention
The present invention relates to the processing of sensor data, more particularly, to a sensor data locating method and apparatus.
2. Description of Related Art
In the smart planet scenario (e.g., Intelligent Operation Center (IOC) for city), there are thousands upon thousands of sensor devices distributed around city areas. The data generated by these sensor devices are gathered into a data center to support further intelligent analysis. The sensor data have the following characteristics: the sensor data are natively clustering together temporally and spatially in that, spatially, data of one sensor is the data of the area monitored by the sensor, and temporally, the sensor generates and stores the sensor data in the order of time; these sensor data are nearly all “written once, read many times”; these data files from thousands upon thousands of sensor devices will be gathered into a uniform big file, to facilitate management and global query.
Currently, Hadoop-like technologies (including Hadoop technology and other similar massive data distributed storage technologies) provide high availability and high throughput capacity to realize the storage and processing of massive data; however, the low latency, especially the low latency caused by low disk I/O bandwidth, is still a problem yet to be solved. Since the amount of sensor data is too large, they are usually stored in disks of data nodes distributed in various places in the form of blocks or chunks with certain redundancy. When a Hadoop-like based intelligent application performs a query, it will load all the data in the source file in the form of blocks from the disks of data nodes to the memory where the working processor resides. Then it is determined in the memory which data is related to the request, and which data is irrelevant to the current request, and the irrelevant data is discarded in the memory. Such an approach will make the disk I/O loads a large amount of invalid and irrelevant data into the memory, so as to bring extra disk I/O burden. Moreover, the larger the original file is, the greater the disk I/O burden is.