Many new computing applications involve the generation and transmission of data from a group of sensor devices to a remote “sink” node, where such data is aggregated and analyzed. Such applications are becoming common in a variety of remote monitoring scenarios, such as healthcare (where wearable sensors record and transmit various biometric measures of an individual), vehicular telematics (where on-board sensors measure various vehicular parameters and transmit them back to a central diagnostic server) and intelligent transportation systems (where highway sensors periodically record traffic conditions).
Such data gathering systems have two important goals or concerns. First, as many of these sensor devices are resource-constrained themselves (e.g., operate on batteries), the system should minimize the communication and/or the data collection overhead, helping to reduce the energy expenditure or network bandwidth consumption of such devices. Second, many of these devices are not just reporting nodes, but also possess a fair degree of processing power and local intelligence. Architecturally, such data collection systems comprise a set of client sensor devices that are connected (typically using a short-range wireless technology such as Bluetooth or ZigBee) to a personal gateway device, with the gateway device subsequently being connected (often using a wireless communications infrastructure) to a remote sink node (or server), wherein this sink node is a part of an existing information technology infrastructure.
One simple form of improving the efficiency of the data gathering system is to compress the sensor data prior to transmission. Another important technique for improving the efficiency of the data transmission is to perform data filtering—this refers to the idea that much of the data may be eliminated or reduced if it is not necessary to the end goals of the infrastructure. There are a wide variety of compression schemes available—such as Huffman, Vector Quantization (VQ), Lempel-Ziv (LZ), run-length coding etc.—and can be broadly classified into two categories—lossless (where the exact data values can be recovered during decompression) and lossy (where the compressed data cannot be inverted to recover the exact data values). In general, different compression algorithms are applicable to specific types of data sources—different data sources possess different “statistical parameters” and different algorithms work better for different families of statistics. In addition, the choice of a compression algorithm is also determined by the application's requirement on the quality of the compression—e.g. does the compression need to be lossless, or can some distortion during the compression process be tolerated by the application? Similarly, the type of filtering performed directly affects the statistics of the filtered data, and thus determines the efficacy of various compression algorithms.
In many applications, the quality of the compression required is not constant for a given type of sensor data, but may vary based on some external context. Here, context refers to various dynamic attributes of the environment, such as the current location of the individual wearing the sensors, the type of activity the user is currently performing or the specific queries that the application must answer on the sensed data. As an example from remote healthcare monitoring, an application may need to retrieve the exact ECG data (i.e., permit only lossless compression) when the user is in the gym, but may only need lower-quality (lossy) data (e.g., for simple arrhythmia alerts) when the user is at home. Similarly, different forms of filtering may alter the statistics of the data associated with a sensor at different times. For example, the infrastructure may wish to be notified of the exact heart rate samples if the readings lie outside (70,90) when at home, and may require only a per-10 minute notification in case the readings lie in the range. In this case, the relaying device or sensor may perform averaging when all readings lie within this range—it stands to reason that the statistical properties of the “average” readings (e.g., their resolution, their likely variation across consecutive samples) will have very different statistics than the raw data.
It would be desirable for the system to allow the sensor devices or any relaying device the capability to dynamically modify the compression technique applied to a raw or filtered data stream based on changes to the external context.