In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. For instance, it is known that vast amounts of personal data about individuals are stored at different commercial vendors and organizations in the form of multidimensional data records. In many cases, users are willing to divulge information about themselves only if the privacy of the data is guaranteed. Thus, methods are needed to mask the sensitive information in these records.
A number of methods have been proposed for privacy preserving data mining of multidimensional data records. One method of preserving privacy in accordance with such data mining operations is known as anonymization, wherein a record is released only if it is indistinguishable from k other entities in the data.
Anonymization requires a high degree of spatial locality for effective and statistically robust implementation. In high dimensional space, the data becomes sparse and the concept of spatial locality is no longer easy to define from an application point of view. When the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range.
Accordingly, there is a need for improved techniques for privacy preserving data mining of multidimensional data records.