The ever increasing instrumentation of the physical and the virtual worlds provides an unprecedented opportunity to collect useful data from diverse sources and to mine such data for understanding phenomena. Participatory data mining techniques are commonly used for this purpose. Participatory data mining techniques enable a requestor to aggregate queries for mining information that is extracted from time-series data that is collected by individual users. However, users may not be willing to reveal true values of data for various reasons, such as privacy considerations.
Random perturbation is commonly used to introduce uncertainty about individual data values. However, random perturbation techniques often do not adequately support time-series data and distributed sources, both of which characterize participatory data mining techniques. For example, many such techniques typically assume that correlations among data are negligible, though data at successive timestamps from a common source may be highly correlated. Even if answers to a query sequence are independently perturbed, the perturbations often can be distinguished from the original answers and filtered out if the time-series exhibits a pattern or relatively strong correlation. Moreover, if successive query answers are correlated and noise is added independently to each answer, the amount of noise required to hide the correlation can be extremely large, making the noisy answers practically useless for a long sequence of queries. Furthermore, such techniques usually assume existence of a trusted third party for introducing noise with respect to the data. The trusted third party typically has access to true values of the data before the noise is introduced, thereby exposing the data to risk of a privacy attack.
In absence of a trusted third party, users often perturb their data before publishing the data to the requestor. However, if users perturb their data independently, the noise variance in the perturbed estimate grows linearly with the number of users, which may reduce the utility of the aggregate data. To improve utility, cryptographic techniques, such as Secure Multiparty Computation, can be used to compute accurate perturbed estimates in a distributed setting. However, the computational performance of such cryptographic techniques does not scale well with a relatively high number of users.