It is estimated that by 2020 there will be more than 50 billion Internet connected devices. These devices will include sensory devices that can observe and measure physical world phenomena and report or generate information about the real world entities (i.e. “Things”). The collected data can be simple types such as temperature, humidity, light or composite and complex information such as detecting an event or combination of different data to measure and report pollution level at a specific location. Extension of the current Internet, integrating real world data and providing autonomous or user-mediated interactions with the real world objects over the Internet is often described under the umbrella term of the “Internet of Things” (IoT).
IoT data is provided by RFID, sensor nodes or other network-enabled devices (or is submitted directly by human users via social media and/or smart devices—i.e. Citizen Sensing). IoT data can be described as numerical measurement data or syntactical description of events and observations from the real world. The data can be provided as raw values or it can include enhanced meta-data and semantic descriptions that represent different attributes of the data. The IoT data can be stored on the nodes and devices; it is generally perceived that the IoT data is to be cached/stored at the edge of the access networks (i.e. in gateways and often stored for short-term). Some of the key attributes that are essential for search and discovery and processing of the IoT data in large and distributed environments, where various sources can share and publish data from different locations and/or are related to different phenomena, are thematic (e.g. type, unit), spatial (e.g. geo-location), temporal (e.g. time stamps) and quality (e.g. accuracy) related attributes. Different data description models are constructed to enhance the semantic description and to enable providing machine-interpretable representations for the IoT data. FIG. 1 shows a semantically annotated description of sample temperature data. The semantic annotation of the data is provided using the Resource Description Framework (RDF) representation. Some of the key works in this area are described in the W3C Semantic Sensor Network (SSN) Ontology and a model provided in P. Barnaghi, W. Wang, C. Henson, and K. Taylor, “Semantics for the Internet of Things: Early Progress and Back to the Future,” Int. J. Semant. Web Inf. Syst., vol. 8, pp. 1-21, 2012.
FIG. 2 reflects an exemplary Machine to Machine (M2M) architecture view and presents a holistic overview of the key components that are typically involved in data collection, dissemination and discovery in the IoT systems. The data can be stored for longer-term in Information Repositories (IRs) 104. The indexing process and storing the references to the data that are stored in network or in short-term/long-term repositories are provided in the Discovery Servers (alternately called “Directory Servers”) (DSs) 106. DSs 106 also maintain the data search and discovery functions and allow the clients (i.e. data consumers) to query the data across distributed networks. The discovery process, if successful, will redirect the query 108 to the gateway 102 or IRs 104 that contains information about the queried data or it will directly fetch the data from the source and will return it to the client. However, the attributes of the IoT data are often dynamic (e.g. their location, quality and availability can change over time) and this will make the index information subject to frequent updates. When the volume of the IoT data (or resources) is considered, maintaining the indices at DSs 106 will generate a considerable traffic load over the network.
S. Evdokimov, B. Fabian, S. Kunz, and N. Schoenemann, “Comparison of Discovery Service Architectures for the Internet of Things,” 2010 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, pp. 237-244, 2010 surveys some of the important state-of-the-art approaches for discovery services. The study assumes that the IoT data would be presented by some numerical identification i.e. Object ID (OID). Information about the objects is stored at distributed Information Services (IS) which are similar to the information repositories 104 shown in FIG. 2. Discovery services process the client's query for a specific OID and provide a link to the IS which is expected to provide the requested information. Various architectures are then proposed to organize the DSs overlay network in order to resolve the ISs in distributed manner. These approaches share similar problems regarding management of the data indices. In following we first explain the EPC global approach as an example and then describe the indexing problem with respect to this example.
EPCglobal is a common standard for RFID data management and sharing infrastructures. The architecture of the EPCglobal is made up of various entities including the EPC discovery services (or EPCDS) which have a similar role to DSs 106 shown in FIG. 2, EPC information services (EPCIS), and naming services. Once EPCISs receive information about a new EPC they publish it at the EPCDS. The clients can query the EPCDS to get information about the events associated with a given OID. EPCDS provides a link to EPCISs that have published information about a given OID. The clients can then use the provided link to query the EPCIS for detailed information about the given OID.
Similar to EPCglobal the other works described in Evdokimove's survey, follow the same concept for publishing the indices to the DSs but instead of a centralized approach they use a network of Discovery Service Providers to scale the query processing into a global form. For example, the Bridge project utilizes Lightweight Directory Access Protocol (LDAP) and N. Schonemann, K. Fischbach, and D. Schoder, “P2P architecture for ubiquitous supply chain systems,” presented at the 17th European Conference on Information Systems, Verona, Italy, 2009 considers a peer to peer architecture.
The main shortcoming of EPCglobal and other similar approaches lies in the management of the indices across ISs and DSs. In case of EPCglobal, indexing high volumes of EPCs at DSs and performing the queries over all data entries is clearly computationally intensive and does not comply with the scale of the IoT data resources. Apart from query processing, insertion and removal of the data entries can generate a significant traffic load between ISs and DSs. Finally these approaches are limited to the situations at which the actual OID is queried. Although such assumptions are valid for most of the applications that are envisioned for RFIDs applications, they cannot fulfil the requirements for other IoT data resources such as sensors and actuators where the dynamic attributes of the data, instead of the static identifier of the resource, is often the subject of query.
F. Paganelli and D. Parlanti, “A DHT-Based Discovery Service for the Internet of Things,” Journal of Computer Networks and Communications, vol. 2012, p. 11, 2012. has proposed a new distributed service discovery mechanism for IoT which expands the preceding architectures by proposing a mechanism to support flexible identification scheme and using multidimensional attribute and range queries. The multidimensional attributes are first mapped into a one dimensional domain and then indexed based on a Prefix Hash Table (PHT) structure. The resulted PHT structure is then distributed across discovery service providers. Discovery service providers are connected in a Distributed Hash Table (DHT) overlay network.
Such an architecture is able to address the need for discovering the entities such as RFID tags. The major shortcoming of this approach is again inefficient indexing mechanism which does not appropriately scale with the size of the IoT data. Mapping the data attributes to one dimension also makes the processing even more challenging.
The above mentioned studies consider RFIDs as the only source of data; in contrast the Linked Stream Middleware (LSM) architecture focuses on sensors and actuators. The LSM provides a framework for providing semantic description (i.e. RDF descriptions) for the sensors and actuators data and allows for SPARQL-like queries across both resources and the harvested data. The sensory data in LSM is annotated and transformed into RDF triples. The triples are then stored in storage, which is capable of executing the SPARQL queries. The main shortcoming of the LSM framework is the lack of scalability due to the centralized architecture. The query execution time is shown to drastically increase with the number of provided triples. Moreover, triple storages are not designed for writing intensive applications and insertion of numerous new data into the triple storage creates a bottleneck for the system.
To summarize, scalability is a common problem associated with the conventional IoT data discovery mechanisms. At the heart of the problem is the data indexing mechanism. While the indices should provide sufficient information for DSs to address the queries, they should be generated in a way that allows for dynamic update with a minimum computation overhead despite of the scale of the data providers. The traffic load associated with the communication of indices between gateways and DSs and even between DSs should not also extensively increase with the number of data resources. Yet, the existing indexing mechanisms fail to satisfy these requirements.
With the foregoing as background information, the present application discloses a new method and system for discovery services in an M2M network.