The ubiquitous nature of networked devices drives the size of networks populated by mobile devices and the Internet of Things (IoT) to larger and larger scales, resulting in the opportunity for analytics that use crowd-sourced sensor data produced by those devices to grow exponentially. Crowd-sourced data is obtained by enlisting the services of a number of people or devices, typically via the Internet. Challenges in using this crowd-sourced data, however, are that there are significant privacy concerns associated with individual sensors and/or sensor readings involving locations and individuals, and that an analytics server has to scale to handle large numbers of crowd-sourced events in short time windows.
Crowd-sourced analytics leverage data extracted by mobile device sensors residing on the IoT. Such analytics typically provide high-level, actionable information about environment events having natural or artificial cause based on crowd-sourced device sensor data. These analytics must be accurate in their assessments while also preserving the privacy of device owners. A key facet for crowd-source analytics is that they must be designed with privacy in mind and follow privacy by design (PbD) principles.
In order to provide privacy and end-to-end protection of crowd-sourced data, the designs must ensure data integrity, authentication, and non-repudiation of data. A common technique in providing this end-to-end protection is to use public key infrastructure (PKI) that utilizes keys generated from device identity for signing and encryption functions. While the use of PKI provides an effective tool for protecting data in motion, the sender's authentication process may result in that user/device losing its privacy in the event that an eavesdropper has access to the authenticating party due to the fact that the user's identity information is passed along with the message.
In addition to ensuring end-to-end protection of data through separate authentication and data processing processes, analytics design must respect privacy principles. Polling processes typically associated with mobile device sensor-driven event generation and reporting typically provide constant or consistent position and sensor data updates. If a bad actor gains access to the analytics processing site or storage, the crowd-sourced data arising from these processes can be assembled into tracks for the mobile device and used to subvert the user's privacy by identifying all the locations to which the user traveled, or the sensor data profile for the physical environment in which the user is situated.
Current methods for ensuring privacy in crowd-sourced analytics have focused on encrypting data-in-transit between the mobile device and the analytics processing and encrypting data-at-rest at the analytics processing site. These methods could be subject to subversion in the event an adversary gains access to storage at the analytics processing site.
One method to alleviate this problem is the use of random identifiers. Using random identifiers, the vehicle tracks can be identified but the randomization process keeps that information from being quickly correlated with a specific user.
Another current method uses the concept of interrupt based virtual trip lines to abstract location data on the mobile device side. This technique applies to traffic analysis, but is lacking in cases when concerned with clusters of behaviors or phenomena dispersed about a given region.
Therefore there exists a need for a solution that separates user identity information from event reporting and acquires, abstracts, and processes the crowd-sourced data in a better manner that further suppresses privacy information while scaling to handle large numbers of event notifications (event storms) in a short period of time.