1. Field of the Invention
This invention relates to data networks and more particularly to efficient distributed processing of network sensor data.
2. Description of the Related Art
Recent advances in computer technology and wireless communications have enabled the emergence of stream-based sensor networks. Broad applications include network traffic monitoring, real-time financial data analysis, environmental sensing, large-scale reconnaissance, surveillance, etc. In these applications, real-time data are generated by a large number of distributed sources, such as sensors, and must be processed, filtered, interpreted or aggregated in order to provide useful services to users. The sensors, together with many other shared resources, such as Internet hosts and edge servers, are organized into a network that collectively provides a rich set of query processing services. Resource-efficient data management is a key challenge in such stream-based sensor networks.
Two different approaches are commonly used in the prior art for managing and processing data in such networks. The first approach involves setting up a centralized data warehouse (e.g. data fusion center) to which sensors push potentially useful sensor data for storage and processing. Users can then make a variety of sophisticated queries on the data stored at this central database. Clearly, this approach does not make efficient use of resources because all data must be transmitted to the central data warehouse whether or not it is of interest. Moreover, this may require a large investment in processing resources at the warehouse while at the same time that processing resources within the network are under utilized.
The second approach involves pushing queries all the way to the remote sensors. Such querying of remote sensor nodes is generally more bandwidth efficient. However, it is greatly limited by the low capability, low availability and low reliability of the edge devices such as sensors.
Thus, a third approach, and an object of an embodiment of the present invention, pushes query processing into the network as necessary is desirable in order to reduce data transmission and better utilize available shared resources in the network. Furthermore, given that various queries are generated at different rates and only a subset of sensor data may be actually queried, caching some intermediate data objects inside the network advantageously increases query efficiency.
In addition, most distributed stream processing systems in the prior art avoid the placement problem and assume that operator locations are pre-defined, and thus are unable to adapt to varying network conditions. In-network query processing has mostly focused on the operator placement problem. At least one prior art solution considered the placement of operators so as to improve performance by balancing load where the communication bandwidth has not been taken into consideration. Another prior art solution considered queries involving only simple operations like aggregation. In this case, the communication cost dominates and it is feasible to perform all operations as close to the sensors as possible. Yet another prior art solution considered queries involving more sophisticated operations with non-negligible computational costs, and developed an operator placement algorithm for the special case in which the query graph is a sequence of operations and the sensor network is a hierarchical tree.
Other prior art solutions considered the operator placement problem for tree-structured query graphs and general sensor network topologies using simple heuristics and localized search algorithms. However, both of these prior art solutions risk never finding a good placement.
In addition, many of these prior art solutions assume that queries are generated at the same rate as the data and are applied to all data. They do not exploit the fact that some queries may be generated at lower rates such that they need only be applied to a fraction of the data generated.
It will be appreciated that there exists a need to determine the optimal network locations at which to execute specific query operations and store intermediate data objects. Intuitively, one would like to place the operators as close as possible to the edge devices (sensors) so as to reduce transmission costs. However, devices close to the edge are likely to have limited processing and storage capabilities. Thus they may not be capable of handling sophisticated queries.
It will also be appreciated that there exists a need to balance these conflicting effects so as to achieve the minimum overall cost in computation, storage and communication through an efficient dynamic reallocation of data processing resources.
Embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.