Time series data are prevalent in a wide range of domains and applications, such as financial, retail, environmental and process monitoring, defense and health care. Additionally, massive volumes of data from various sources are continuously collected. However, data owners or publishers may not be willing to exactly reveal the true values due to various reasons, most notably privacy considerations. A widely employed and accepted approach for partial information hiding is based on random perturbation. See, for example, R. Agrawal et al., “Privacy Preserving Data Mining,” In SIGMOD, 2000, which introduces uncertainty about individual values. Consider the following examples:
A driver installing a vehicle monitoring system (see, for example, D. Automotive, “CarChip,” http://www.carchip.com/, and W. P. Schiefele et al., “SensorMiner: Tool Kit for Anomaly Detection in Physical Time Series,” Technical Report, http://www.interfacecontrol.com/, 2006) may not wish to reveal his exact speed. How can he, for example, avoid revealing small violations of the speed limit (say, by 3-5 mph) but still allow mining of general driving patterns or detection of excessive speeding?
A financial services company may wish to provide a discounted, lower-quality price ticker with a specific level of uncertainty, which is not useful for individual buy/sell decisions but still allows mining of trends and patterns. How can they ensure that the level of uncertainty is indeed as desired?
Similarly, a financial institution (see, for example, Y. Zhu et al., “StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time,” In VLDB, 2002) may not wish to reveal amounts of individual transactions over time, but still allow mining of trends and patterns. How can they control the level of uncertainty (or, privacy) in the published data and ensure that nothing more can be inferred?
Prior work on numerical and categorical data has focused on the traditional relational model, where each record is a tuple with one or more attributes. Existing methods can be broadly classified into two groups and work (i) either by direct perturbation of individual attributes separately (see, for example, R. Agrawal et al., “Privacy Preserving Data Mining,” In SIGMOD, 2000; D. Agrawal et al., “On the Design and Quantification of Privacy Preserving Data Mining Algorithms,” In PODS, 2001; and W. Du et al., “Using Randomized Response Techniques for Privacy-Preserving Data Mining,” In KDD, 2003) or of entire records independently (see, for example, H. Kargupta et al., “On the Privacy Preserving Properties”; Z. Huang et al., “Deriving Private Information from Randomized Data,” In SIGMOD, 2005; K. Liu et al., “Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining,” IEEE TKDE, 18(1), 2006; and K. Chen et al., “Privacy Preserving Data Classification with Rotation Perturbation,” In ICDM, 2005), (ii) or by effectively swapping or concealing values among an appropriately chosen small group of “neighboring” records (see, for example, L. Sweeney, “k-anonymity: A Model for Protecting Privacy,” IJURKS, 10(5), 2002; C. C. Aggarwal et al., “A Condensation Approach to Privacy Preserving Data Mining,” In EDBT, 2004; E. Bertino et al., “Privacy and Ownership Preserving of Outsourced Medical Data,” In ICDE, 2005; and A. Machanavajjhala et al., “l-diversity: Privacy Beyond k-anonymity,” In ICDE, 2006).
Although some of the prior work on relational data has considered certain forms of privacy breaches that are possible by exploiting either the global or local structure of the data (see, for example, A. Machanavajjhala et al., “l-diversity: Privacy Beyond k-anonymity,” In ICDE, 2006; Z. Huang et al., “Deriving Private Information from Randomized Data,” In SIGMOD, 2005; H. Kargupta et al., “On the Privacy Preserving Properties of Random Data Perturbation Techniques,” In ICDM, 2003; and K. Chen et al., “Privacy Preserving Data Classification with Rotation Perturbation,” In ICDM, 2005), the additional aspect of time poses new challenges, some of which are related to fundamental properties of time series (see, for example, D. L. Donoho et al., “Uncertainty Principles and Signal Recovery,” SIAM SIAP, 49(3), 1989). In particular: (i) sophisticated filtering techniques may potentially reduce uncertainty thereby breaching privacy; (ii) time series can be “described” in a large number of ways (in a sense, a univariate time series is a single point in a very high-dimensional space [see, for example, C. C. Aggarwal, “On k-anonymity and The Curse of Dimensionality,” In VLDB, 2005]—for example, if the series has 1000 points, there are many 1000-dimensional bases to choose from); (iii) time series characteristics may change over time and, in a streaming setting, new patterns may start emerging in addition to old ones changing (for example, it is not possible to know about quarterly or annual trends while still collecting the first week of data), making both static, global as well as fixed-window analysis unsuitable.