The use of networked based social networks, for example, Facebook, Twitter, FourSquare, and Google+ has steadily increased along with the use of smartphones equipped with sensors and Internet connectivity capabilities. The marriage of these technologies, smartphones and social networks, will likely yield applications that leverage the data collection capabilities of large numbers of smartphones by applications such as crowdsourcing. For example, real-time traffic monitoring for Google maps is enabled through individuals sharing their location and speed information from their smartphones. This integration also leverages social networking applications for disaster management. For example, an oil spill or other environmental disaster can be monitored by individuals by sharing pictures or other relevant information across a social networking site. A chemical spill or the air quality around a given disaster can be monitored similarly using, for example, air sampling equipment associated with the mobile devices.
The information obtained through the use of these technologies can be aggregated, processed, and then consumed by individuals or by decision makers and public agencies. Existing approaches to such applications have utilized vertical integration. Each application needs its own software agent running on the devices collecting data specific to that application's needs and a devoted backend module for aggregating and processing the collected data to generate desired results. Such a vertical approach aims to optimize the performance of a single application. However, with the proliferation of such applications, great inefficiency and even conflicts may arise for both the software agents and the back-end modules. Many times, these applications need the same type of raw data and access the same physical sensor. The software agents compete for access and repeatedly collect the same raw data. Such uncoordinated collection activities are inefficient and consume resources. For applications in the same domain, e.g., traffic and transportation related, these applications also repeat common primitive processing of the raw data to extract information of higher semantic content. For example, the raw time series of acceleration can be processed to detect the existence of potholes.
Applications leveraging data collection from mobile devices all suffer from a common drawback that the data collection is designed specifically for the particular application, including the software agent running on devices, the backend middleware that configures and controls devices and the primitive data processing performed on devices or in the middleware. Thus the type of data collected and the way these data are collected, i.e., sampling frequency, and processed are all static and tied to the specific needs of the application. The software developed in the context of one application cannot be reused for another application.
These limitations yield an inefficiency on devices. When multiple applications need to leverage the same set of underlying devices, they have to each install their own software agent on the device. These agents are not arranged to work with each other, and they may compete for resources, resulting in conflicts. In addition, these limitations produce inefficiency in middleware. Many applications in the same domain share a common need for similar data. When each application has its own specific middleware, duplicate data collection and redundant data processing are inevitable. These limitations also result in a lack of flexibility. Any application specific middleware is quite customized to the type of data and process needed by that application. This middleware cannot be used to support the diverse needs of different applications simultaneously.
Another approach is similar to “data warehousing”. All kinds of data are collected blindly and dumped into a warehouse. Then applications extract relevant data and perform further processing. This approach, however, is inappropriate especially for systems that use mobile devices as the data collectors. The battery power and communication bandwidth of mobile devices are scarce resources. Continuous collection of all kinds of data can quickly exhaust the energy and monopolize the bandwidth. In addition, the volume of data generated from all the sensors of billions of devices is overwhelming. The network, storage and processing capacity of a given system may not be able to keep up with the volume, causing the “drowning in data” symptom.