Recently, there has been an increasing concern regarding privacy breaches, especially those involving sensitive personal data of individuals as discussed in A. Evfimevski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003, which is hereby incorporated by reference in its entirety. As a result, restrictions and regulations in publishing sensitive personal data have been tightened as discussed in K. Thearling. Data mining and privacy: A conflict in making. In DS*, 1998; which is hereby incorporated by reference in its entirety; these address data owned by government organizations as well as corporations. It is therefore not surprising that the data management community has become increasingly focused on ways to guarantee the privacy of sensitive data.
Meanwhile, unprecedented massive data from various sources provide a great opportunity for data mining and information integration. Unfortunately, the privacy requirement and data mining applications pose exactly opposite expectations from data publishing as discussed in A. Evfimevski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003, K. Thearling. Data mining and privacy: A conflict in making. In DS*, 1998, which are hereby incorporated by reference in their entirety. The utility of the published data with respect to the mining application decreases with increasing levels of privacy guarantees as discussed in D. Kifer, and J. Gehrke. Injecting utility into anonymized datasets. In SIGMOD, 2006, which is hereby incorporated by reference in its entirety. Previous work has noticed this important tradeoff between privacy and utility and various techniques have been proposed to achieve a desired balance between the two as discussed in R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, 2000, H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In ICDM, 2003, K. Liu, H. Kargupta, and J. Ryan. Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE TKDE, 18(1), 2006, K. Chen and L. Liu. Privacy preserving data classification with rotation perturbation. In ICDM, 2005, W. Du and Z. Zhan. Using randomized response techniques for privacy-preserving data mining. In SIGKDD, 2003, A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. I-diversity: Privacy beyond k-anonymity. In ICDE, 2006, L. Sweeney. k-anonymity: a model for protecting privacy. Int. J Uncertain. Fuzziness Knowl.-Based Syst., 10(5), 2002, A. Evfimevski, R. Srikant, R. Agarwal, and J. Gehrke. Privacy preserving mining of association rules. In SIGKDD, 2002, which are hereby incorporated by reference in their entirety.
Prior related work such as that described in R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, 2000, D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS, 2001, H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In ICDM, 2003, Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. In SIGMOD, 2005, which are hereby incorporated by reference in their entirety, includes additive random perturbation for the offline, conventional relational data model, where the noise is distributed along the principal components of the original data in order to achieve maximum privacy, given a fixed utility. These offline algorithms are not optimal when applied to numerical, non-stationary (or, time-evolving) data streams. The dynamic correlations and autocorrelations, if not carefully considered, may allow for the reconstruction of the original streams. Other problems are that in random perturbation systems, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate.
Therefore a need exists to overcome the problems with the prior art as discussed above.