The present invention relates to the field of data anonymization, and more particularly relates to a method, a computer program product, and a system of Information Governance and Data Privacy Protection.
Privacy regulations like GDPR (http://www.eugdpr.org/) focus on protecting individuals against certain usage of their data. It is not in correspondence with law to use the individual's data for certain kind of analytics unless they explicitly declared their consent with that usage. On the other hand, it may be acceptable to collect and store personal data and it may also be acceptable to use it in analytics as long as the data is “sufficiently anonymized”, e.g. GDPR phrases “If the data processed by a controller do not permit the controller to identify a person, they are not required to gain additional information for the purposes of complying with this regulation.”
As an example: It may be assumed that a data scientist wants to investigate a table with the results of a medical study to find new relationships between regions and certain kind of diseases. Certainly, the data scientist should not be allowed to identify concrete individuals and to see their sensitive personal information. Fortunately, in this example, identifying columns like “Name” are not necessary for the investigation, or quasi-identifying columns like “Hobbies” could likely get masked without impairing the investigation. So these columns are easy to deal with.
In contrast, the investigation would be impossible if other quasi-identifying columns like address or the disease information were fully encrypted or masked or redacted. On the other hand, not anonymizing such columns typically leads to easy identification of individuals as there might be diseases and/or addresses that are too unique. For such cases anonymizing by generalization can help, e.g. generalizing concrete addresses to cities or generalizing very detailed disease information like “Asian Flu” to more general information like “Flu”. Such generalizations often do not hurt the investigation, but could help to sufficiently anonymize data so that individuals cannot be tracked down.