Personally identifiable information (PII) is information that can be using on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. Corporations and agencies are often under an obligation to protect content containing PII to prevent exposure of the PII to unauthorized parties. Because of the significant reputational and financial consequences of failing to protect content containing PII, corporations and governmental agencies have made it a major goal to identify and protect such content. Privacy expectations arise from a number of laws in different jurisdictions such as the Health Insurance Portability and Accountability Act (HIPPA) and Payment Card Industry (PCI) data security standards. One of the most challenging aspects related to identifying and protecting PII is how to deal with “unstructured” content. Unstructured content refers to information that does not have a pre-defined data model or is not organized in a pre-defined manner. Examples of unstructured content may include, for example, documents or files on file shares, personal computing devices, and content management systems. These documents and files may be generated within or outside of an organization using many applications, can be converted to multiple file formats (e.g., Portable Document Format (PDF), and seemingly have unlimited form and content. By contrast, structured data such as data stored in databases and support systems have often have defined fields in tables that have defined relationships with each other. For example, to protect social security numbers in a database, access to the field for social security numbers is controlled. With unstructured documents, the detection of PII is more challenging.