During the normal course of business, companies accumulate large amounts of data. Recently, some companies have begun to monetize this data by sharing their data with third parties, such as advertisers, researchers, or collaborative partners. The third parties pay a certain monetary fee and in exchange, receive relevant data from a data owner. The third party can then use the data to target advertising or conduct research. However, the data requested by the third parties often includes information that is private to one or more individuals from whom the data is collected.
To protect the privacy of individuals from which the data is collected, anonymization of the data can occur prior to being provided to the third party. Data anonymization includes the altering of data to protect sensitive information while maintaining features that allow a requesting third party to use the data. The data altering can include adding noise, reducing precision of the data, or removing parts of the data itself. Generally, data owners do not have enough knowledge regarding anonymization and thus, rely on third parties to identify sensitive information for anonymization, as well as to anonymize their data prior to providing the data to a third party.
One approach to anonymization includes contacting an anonymization service provider that provides individual personnel to help with the data anonymization. The personnel assigned to the anonymization has access to the data despite being an untrusted third party. Currently, many companies ask the anonymization service to sign confidentiality agreements, such as a Memorandum of Understanding or a Non-Disclosure Agreement to protect the data prior to and after the data is anonymized. A further approach includes relying on software applications for anonymization. However, the software must generally be given full access to the data and security concerns still arise. Further, most software usually requires the data owner to specify which data attributes should be anonymized, as well identify the anonymization parameters. Unfortunately, most data owners lack the knowledge and expertise to accurately identify which attributes require anonymization and the level of anonymization necessary to prevent the identity of an individual from being disclosed. Thus, a technical problem exists in which data to be analyzed for anonymization must be disclosed either to an individual or software prior to determining whether the data need be anonymized, which can allow sensitive information to be discovered by an untrusted party.
Therefore, there is a need for an approach to automating data anonymization, including identifying specific data attributes for anonymization, to prevent data disclosure or breach and ensure data protection.