Many new computing applications are focusing on the automation and intelligent control of a variety of operations in enterprise and non-traditional environments, such as homes, shopping malls and stadiums. These applications often use a variety of sensor nodes or devices, which collect data about the dynamic state of their local environment, and transmit data back to a more traditional data processing infrastructure. Many of these computing applications involve long-lived monitoring, i.e., they are interested in the continual evolution of the environmental state as it evolves over a substantially long period of time. For example, the monitored data may be the temperature values reported by a sensor, or the number of people entering through a particular gate (as reported by a video or RFID sensor). Architecturally, such monitoring applications consist of one or more sensor nodes (that actually sense the environmental state), which report back to one or more sink nodes. This sink is typically connected to the traditional data processing network infrastructure and is responsible for collecting the individual data samples and then processing them to deduce higher-layer or derived environmental state (e.g., the average temperature reading from sensors in a particular spatial region). One of the main concerns in efficient data gathering in such sensor-based environments is the minimization of the communication overhead incurred by the sensors. This is especially true of emerging sensor infrastructures where the sensors are either resource-constrained themselves (e.g., operate on batteries and may be located in relatively inaccessible places) or utilize a communication infrastructure (e.g., a wireless network) that is bandwidth-limited.
The most well-known and efficient technique to reduce the communication traffic back from the sensors to the sink for such continuous monitoring applications is to use the notion of a precision range or interval associated with each sensor. Typical prior art sensor monitor techniques using this approach are described in the references to C. Olston and J. Widom, “Offering a precision-performance tradeoff for aggregation queries over replicated data,” in VLDB '00: Proceedings of the 26th Conference on Very Large Data Bases, Margan Kaufmann Publishers Inc., 2000, pp. 144-155; C. Olston, B. T. Loo, and J. Widom, “Adaptive precision setting for cached approximate values,” in SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on management of data. ACM Press, 2001, pp. 355-366; and, Q. Han, S. Mehrotra, and N. Venkatasubramanian, “Energy efficient data collection in distributed sensor environments,” in ICDCS '04: Proceedings of the 24th International Conference on Distributed Computing Systems, IEEE Computer Society, 2004, pp. 590-597. Such approaches essentially bound the uncertainty about the precise value of a sensor's data to a specified value and have been previously reported in these prior art approaches. The precise range specifies an acceptable level of uncertainty, such that the sensor need not communicate its data samples back to the sink until and unless they fall outside this specified range. Precision ranges are particularly useful in many real-life applications which do not need to know the precise values of the environmental state, but can tolerate some amount of inaccuracy. For example, an application monitoring the temperature reading of a sensor may allow the individual sensor to have a precision range that is +/−1° C. This implies that, after a sensor reports back to the sink with a value of, say, 25° C., it need not communicate back to the sink its subsequent data samples unless they exceed 26° C. or fall below 24° C. Clearly, a wider range reduces the reporting frequency of the sensor, since many minor variations in the sampled values, often due to noise or environmental transients, lie within the specified interval. Of course, this approach works only as long as the resulting overall inaccuracy (or uncertainty regarding the collective statistical measure of interest to the application) is with the application's acceptable limits, since now the application will no longer receive any information from the sensor as long its values lies within the tolerance range. From a broader perspective, by accommodating a degree of divergence, albeit bounded, between the true value of the sensor and the perceived value at the sink, this approach transforms the conventional sink-initiated, polling-based model of data collection (where a sink periodically retrieves the data from the sensor, or the sensor periodically sends it data to the sink, even though there has not been significant change since the last transmission) to a more energy-efficient source-initiated, event-driven (where only significant changes in the reported values trigger an update to the sink) framework.
While this approach of setting the precision range is well defined for individual sensors, many sensor-based applications are concerned with the aggregate statistics computed from a plurality of sensors, rather than an individual sensor. Examples of such statistics include the mean, maximum or sum of the readings obtained from a target sensor set, where the target itself may be explicitly defined (i.e., sensors A, B and D) or, more likely, is implicitly specified through predicates on certain attributes, such as the sensor location, type or owner (i.e., all sensors of type “thermal” located in the “hawthorne office” and owned by “IBM security”). An alternative example would be a cumulative count of the number of people entering a stadium, composed as a sum of the individual readings reported by a sensor at each separate entry gate. In such scenarios, the application's tolerance range would be specified by a precision range on the aggregate (e.g., “tell me the average of all the sensors, with a tolerance of +/−2 units on the average”). In general, any application requiring data may specify, either explicitly or implicitly, a Quality of Information (QoI) bound, indicating some requirements on the accuracy or precision of the collective data.
Currently, there is no implemented solution that addresses the problem of efficiently orchestrating the transmission of data gathered by a collective set of sensors to ensure conformance to a specified QoI bound.
It would be highly desirable to provide a system and method for efficiently orchestrating the transmission of data gathered by a collective set of sensors to ensure conformance to a specified QoI bound, specially for long-lived monitoring applications, where conformance is desired continually over the application's lifetime.
One approach for using the precision based technique for such collective sensor data applications is to decompose an aggregate tolerance bound into identical precision ranges for each of the constituent sensors, e.g., for an application that desires to know the sum of the sensor readings A, B and C with a tolerance of 3 units, a trivial way to set precision ranges would be to associate a uniform precision range of 1 unit with each of the three sensors. However, this process of decomposing a collective precision range into individual precision settings suffers from two deficiencies: First, it fails to accommodate the fact that the sensor network may exhibit heterogeneity, so that it may be more efficient to distribute the precision ranges in an unequal manner. For example, if some sensors have lower battery capacity or are very distant from the sink and incur high communication overhead, it may be more efficient to provide them a larger precision range (thus, in general, a lower reporting rate), while compensating for this by tightening the bounds for more resource rich sensors; Second, this approach of adjusting the precision range is only performed at the time instants when a sensor reports back to the sink with a new value (outside the current range). The process makes no attempt to utilize the temporal correlation or predictability in the variation of each sensor's data. For example, for sensors monitoring the temperature on the outside wall of a building, it can be expected (or predicted) that the sensor readings should be higher during the day, and lower during the night.
It would thus be further highly desirable to provide a system and method that addresses the problem of efficiently orchestrating the collection and transmission of data gathered by a collective set of sensors to ensure conformance to a specified QoI bound while factoring into the data collection mechanism the knowledge of the expected or observed sensor behavior.