Storing data for the use in future for various purposes is a well known and frequently followed technique now days. Privacy preserving data mining is a research area concerned with protecting privacy derived from personally identifiable information in the process of data mining. Today privacy is an essential feature for both a user as well as an enterprise. Many organizations collect data for scientific research as well as for market analysis. Access to this data is also given to third parties for further productive analysis. The available datasets pose serious threats against the privacy of individuals and organizations. In view of this concern to address privacy of sensitive data, lot of research techniques have been proposed, laws are formed and management techniques are in developed.
The most popular techniques followed for preserving privacy are data perturbation, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and interference etc. All these methods are frequently used but they involve utility loss. More ideal the privacy preservation model is, more is the utility loss, i.e. the information derived from the database becomes less. An absolute privacy protected database has zero utility. Another primary technique used to control the flow of sensitive information is suppression where sensitive information and all information that allows the inference of sensitive information are simply not released. However, suppression can drastically reduce the quality of the data and in the case of statistical use; overall statistics can be altered, rendering the data practically useless.
Nowadays, k-anonymity and l-diversity are also available methods for privacy preservation. In a k-anonymized dataset, each record is indistinguishable from at least k−1 other records with respect to certain “identifying” attributes. On the other hand l-diversity provides privacy even when the data publisher does not know what kind of knowledge is possessed by the adversary. The associated problem with these solutions is that they are NP-hard in nature as they are basically search over space of possible multi-dimensional solutions. Also the high dimensionality of these techniques adds computational overhead.
Therefore, there is a need of a method and a system for preserving privacy of the data which reduces the computational complexity. Also, a system and method is needed that increases the degree of utility of data along with the reduction in information loss.