1. Field of the Invention
The present invention relates to data processing techniques, and in particular, to a method and system for tagging original data generated by things in the Internet of Things (IoT).
2. Description of Related Art
The Internet of Things (i.e. IoT) has been recognized as the next significant revolution of Internet. The so-called IoT refers to providing various real-world things, such as streets, roads, buildings, water-supplying systems and household appliances with something like sensing devices, connecting them through the Internet and thereby executing specific programs, so as to achieve remote control or direct communication between these real-world things. The IoT has widened the scope of connected objects from electronics to all kinds of real-world things, that is, archiving human-machine communication and interaction, as well as the communication and interaction between objects by means of radio frequency identifications (RFIDs), sensors, binary codes and the like provided for various kinds of things through connecting to wireless networks via interfaces. For example, in the near future, household appliances, hospital devices, even a T-shirt can be connected and visited in networks just like web pages or remote servers. As a result, all the real world things can be monitored and operated through networking and their behaviors can be programmed for human convenience.
In the IoT, given a certain event, how to find sensors that have recorded information related to the event is a problem. For example, given the query “(rear-end collision)”, how to find cameras that have recorded such events. Such IoT search is a very important application for the IoT. Different from the World Wide Web network, construction of IoT search engines faces the following challenges:
First, the total number of things in the real world is in the order of exponent. Objects in the Internet would encode 50 to 100 trillion objects. Every human being is surrounded by 1000 to 5000 objects. The huge data scale is unaffordable for current search engines. According to statistics, the search engine of Google only indexes 1 trillion web pages in 2008.
Second, original data acquired from various things in the IoT might be in the form of image, video, audio, numerical data sequence, wavelet or the like. Substantially, no metadata is available for describing the semantics of these original data, and computers per se are unable to understand the contents of these data files. In other words, it is hard to convey human opinions and sentiments through acquired original data, and it is hard for human to understand these original data as well. Although holding plenty of original data in hand, human has found that searching related information in nature language or original data association mining, etc are difficult.
There are techniques for the deep processing of original data nowadays. However, due to the large volume of things in the IoT, such as sensors and the like, extracting semantic annotations via deep processing such as computing vision technologies is computational unaffordable. Furthermore, even with deep processing, due to the flexibility of applications, such as queries, a plenty of models are needed to be built to handle various applications, which is also impractical.
FIG. 1 is a schematic diagram showing the problems between actual applications in the prior art and original data generated by things. As shown in FIG. 1, users query sensor data in human language over a network. However, although there are huge amount of original data files available, due to the wide gap between the natural language queries by users and the original data files from the sensors, and because there is nearly no metadata available for semantic description of the original data files, it is not surprise that users can not acquire what they expected. Thus, how to associate natural language queries with original data to facilitate data search and mining as well as data association mining and the like is a technical problem in the prior art.
Therefore, there is a need in the prior art to provide a technique for tagging original data generated by things in the IoT for further data processing.