The problem of efficiently retrieving correlated information from a large number of distributed sources appears in many areas of engineering, such as querying from a distributed database, content-delivery in peer-to-peer networks or the exchange of local information in a sensor network.
In the literature, most of the work on the data distribution problem proposes strategies for either the source coding or the channel coding aspects while enforcing a separation between the two functionalities. It has been shown by Gupta and Kumar that, without considering the transmission content, peer-to-peer packet transmission among arbitrarily chosen transmitter/receiver pairs in the wireless networking scenario leads to a per node throughput that vanishes as O(1/√N), where N is the number of nodes in the network. This scaling law may not be as important in sensor applications, since it is often necessary for the distributed sensors to convey their local information to a central processor, which leads to the so called many-to-one network.
Unfortunately, the many-to-one network topology can leads to even more restrictive scaling laws on the per node throughput, such as the O(1/N) scaling performance derived in (or the O(logN/N) by utilizing antenna sharing schemes ). This is caused by the bottleneck effect that occurs at the last hop towards the data-gathering node. Relying on the assumption that sensors' observations are increasingly correlated when the density of sensors increases, many authors proposed the use of distributed source coding (DSC) techniques based on the Slepian-Wolf encoding for discrete data or Wyner-Ziv encoding for continuous data, to reduce the aggregate information rate produced by the nodes in a sensor network. Capitalizing on the high data correlation, it has been shown that the cross-layered optimization of the source coding and transmission scheduling strategies (or cooperative transmission schemes) can achieve good scaling performances for a large class of sensor data models where the aggregate information rate increases slower than the network capacity. Unfortunately, the methodologies used to prove these results do not scale well in terms of complexity or latency, which is primarily caused by the need of encoding over long temporal sequences and by the strict sequential decoding structure. Although suboptimal solutions are available, the cross-layer optimization of the information flow through the network is a complex problem that is left unsolved. Even if they are discussed at all, the required transmitter and receiver protocols are often of high complexity and not suitable to be generalized and applied to a random network.
In the multiple access application, group testing was proposed as an efficient solution for random access scheduling in packet networks. In this case, the information that is resolved through group testing is the presence or absence of a message within the transmission queue of each node. In the framework of these publications, the probability of a certain node having a message to transmit is independent and identical from node to node, therefore, the entire network state can be modeled as a sequence of i.i.d. (independent identically distributed) Bernoulli random variables with probability p of having a message to transmit, much like the modeling of the blood testing problem. In sensor networks, an analogous model could arise when unexpected independent events trigger alarms in isolated sensors. Since the outcome of each test belongs to a binary alphabet, an obvious lower bound for the number of tests is represented by the joint entropy of the Bernoulli i.i.d. (independent identically distributed) field H(X)=NH(p). In the work mentioned above, the group testing strategy was applied only to multiple access scheduling.
There is a need for methods and systems for obtaining data from the sources such as sensor networks. There is also a need for methods and systems for obtaining data from the sources such as sensor networks which correspond to more general data models than the Bernoulli i.i.d. data model.