The European Union's (EU) General Data Protection Regulation (GDPR) is in force from May 25th 2018. It replaces the EU Data Protection Directive 95/46/EC, and will apply to all member states of the EU without having a distinct national implementation. Article 4(1) and (2) of the GDPR provide definitions for ‘personal data’ and ‘processing’ respectively. Personal data in the GDPR refers to any information which relates to an identifiable natural person. Processing thereof is any usage of that personal data, from collection to erasure and anything in between. The GDPR includes greater territorial scope than the EU Data Protection Directive 95/46/EC, notably and importantly, international application to those who process personal data of individuals in the EU (Art. 3). The GDPR has more rules for transferring personal data to international organizations or third countries than internally to the EU (Art. 44). One of these rules is based on an adequacy decision, that is a sufficient level of protection as per Art. 45(2), assessing the entity's laws, supervision authorities, and international commitments. If a country lacks adequate privacy law, a legal agreement may give grounds for adequacy, for example, Safe Harbor, the original attempt at facilitating trans-Atlantic data flows between the EU and US.
Pseudonymization is a procedure by which the most identifying fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. There can be a single pseudonym for a collection of replaced fields or a pseudonym for each replaced field. The purpose is to render the data record less identifying and therefore lower user objections to its use. Data in this form is suitable for extensive analytics and processing.
The choice of which data fields are to be pseudonymized is partly subjective, but typically includes all fields that are highly selective, such as Social Security Number (SSN) (in the United States). Less selective fields, such as Birth Date or Zip Code are often also included because they are usually available from other sources and therefore make a record easier to identify. Pseudonymizing these less identifying fields removes most of their analytic value and should therefore be accompanied by the introduction of new derived and less identifying forms, such as Year of Birth or a larger Zip Code region.
Data stored in data repositories, may often be pseudonymized for security purposes, privacy concerns, data loss prevention and compliance. For example, the EU GDPR requires that certain types of data be pseudonymized.
Current solutions for discovering whether or not data stored in business enterprises is pseudonymized are typically product specific. Typically, the specific product used to create the pseudonymization must be known in order to determine the pseudonymized state of any data. For example, some pseudonymized solutions maintain a table or database with entries and or metadata therein that do not specifically indicate whether data is pseudonymized or not. The method of indicating the pseudonymization state of data must be known in order to determine from the table or database whether any particular data is pseudonymized or not.
The ability for an expert, for example, a data protection officer within a business enterprise, to detect whether data is pseudonymized, may have to be very elaborate, complicated, and costly in order to take into consideration all of the possible products and methods providing pseudonymization techniques that may be used on all of the different computer devices within the enterprise. Additionally, maintaining this ability in the face of ever-changing numbers and types of available products providing distinct pseudonymization techniques may be very time consuming and expensive. Also, outside vendors, contractors, or temporary consultants may use their own computer devices for providing pseudonymized solution or techniques that are unknown to the business expert.