The disclosure generally relates to the field of information security, and more particularly to multicomputer data transferring.
Organizations collect data about customers or clients to improve the data available for data mining. A customer will often create an account with identifying information, such as name, e-mail, address, and phone number. An organization then maintains transaction data by the created account. An organization may use the collected data to target advertisements, tailor offers, and/or improve user experience. The account information is personally identifiable information (PII). In the NIST Special Publication 800-122, the National Institute of Standards and Technology defines PII as “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. An element of information that alone can be used to identify someone is referred to as an explicit identifier. An element of information that can be linked or combined with another element of information to identify someone is referred to as a quasi-identifier or quasi-identifying information. The collected transaction data can include quasi-identifiers. The failure of an organization to protect PII harms individuals as well as the organization since the failure can impact an organization's reputation, incur legal liability, and/or remediation costs.
Organizations use de-identification or anonymization of PII to preserve privacy of individuals. The International Association of Privacy Professionals (IAPP) defines de-identification as an action taken to remove identifying characteristics from data. The IAPP defines anonymization as a process of altering identifiable data in such a way that it no longer can be related back to a given individual. Anonymization techniques include removing identifying values from data (suppression), making identifying values broader (generalization), and swapping identifying values of individuals within a data set (noise addition).