This application relates generally to anonymization of data and data sets. More specifically, the disclosure provided herein relates to generating data and data sets that are resistant to minimality attacks.
Anonymization of data has become a popular method of protecting the privacy of individuals, accounts, or other private information associated with shared data. For example, an entity sharing data that includes sensitive information may anonymize the data to allow others to study the data without compromising privacy of individuals or entities reflected in the data. Thus, third parties can apply new analysis and data mining techniques to the data without exposing individuals or entities associated with the data.
Similarly, entities in control of sensitive data may wish to retain and/or use data that includes sensitive information. Various laws and regulations affect how such data may be retained and/or used over time. Additionally, attackers may target stored sensitive data in an attempt to obtain personal information of entities or persons associated with the sensitive data. Thus, the entities in control of the data may wish to anonymize the data to protect individuals and/or entities associated with sensitive information from exposure.
One problem with anonymizing data is that important aspects of the data may be lost, thereby affecting the usefulness of the anonymized data compared to the original data. In response to these considerations, many methods and definitions have been introduced for minimizing the amount of information lost during anonymization of shared data. Efforts to minimize the amount of data by anonymization, however, can expose private information to determination via inference and knowledge of anonymization techniques, sometimes referred to as “minimality attacks.” Minimality attacks are possible for a variety of privacy algorithms and definitions.