Personal identifiable Information (PII) data is among the top priorities and a high risk for data driven companies. In the recent years there has been a huge investment to improve data driven development. This involves extensive data sets of feedback and telemetry data collected actively and passively directly from customers and their devices. There is an inherent risk of collecting private data from customers both intentionally and accidentally. Being able to detect and correct such mistakes is critical to deliver the privacy protection level that often is promised to customers.
PII has been traditionally detected using common pattern matching algorithms. As an example, common patterns may include email addresses, phone numbers or SSN numbers. In order to detect PII data with traditional mechanisms, it is often necessary to create a pattern and a set of rules to test each data value. In essence, it is necessary to know what PII data to look for in order to find it. This is very limiting as most PII data does not follow patterns like passwords.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.