Information management platforms are widely encountered nowadays, be it in the form of open data portals or in the form of a proprietary marketplace, where individuals or companies can purchase and sell data. The wealth of information that is currently available, recently estimated in the order of zettabytes, can be combined and aggregated in unprecedented ways, leading to an increasing concern about potential privacy violations and sensitive knowledge leakage. These concerns are more important when proprietary datasets, which are typically of high quality and fine grained, are combined with externally available information sources to discover interesting knowledge patterns.
Existing research in privacy-preserving data mining, has proposed a wealth of approaches that aim at protecting sensitive knowledge exposure. These approaches can be classified along two broad directions, namely knowledge hiding and query auditing. In knowledge hiding, individual datasets are sanitized to prohibit the exposure of sensitive knowledge patterns that are usually considered in the form of frequent item sets, association rules or classification rules. Alternatively, query auditing approaches focus on modifying or restricting the results of queries in databases containing private data. These approaches operate by examining simple queries (e.g., count queries, sum queries, etc.) that were answered in the past to determine whether answers to new queries could be used by an individual to ascertain confidential information that is forbidden by pre-specified disclosure policies. Queries that could potentially cause a break of privacy are denied (not answered) or are partially answered. Similar to knowledge hiding, query auditing approaches consider a single dataset and attackers who could expose sensitive information from this data.