The collection and analysis of data is used by almost every industry. Increasingly industries are processing large datasets with more and more attributes that describe entities. At times, the data may be anonymized to either enable modern business models, guarantee data privacy enforced by legal practices or to facilitate researchers access to otherwise protected information.
Conventional anonymization practices rely on modifying the data itself or falsifying records in a way that parts of the data cannot be linked back to their original entity while preserving data quality and correlations as much as possible. When applied to high-dimensional (e.g. multiple attribute) datasets, these existing processes may lead to a dramatic loss of data quality.
Systems and methods are desired which support efficient and effective data anonymization.