Various user productivity applications allow for data entry and analysis of user content. These applications can provide for content creation, editing, and analysis using spreadsheets, presentations, text documents, mixed-media documents, messaging formats, or other user content formats. Among this user content, various textual, alphanumeric, or other character-based information might include sensitive data that users or organizations might not want to include in published or distributed works. For example, a spreadsheet might include social security numbers (SSNs), credit card information, health care identifiers, or other information. Although the user entering this data or user content might have authorization to view the sensitive data, other entities or distribution endpoints might not have such authorization.
Information protection and management techniques can be referred to as data loss protection (DLP) that attempts to avoid misappropriation and misallocation of this sensitive data. In certain content formats or content types, such as those included in spreadsheets, slide-based presentations, and graphical diagramming applications, user content might be included in various cells, objects, or other structured or semi-structured data entities. Moreover, sensitive data might be split among more than one data entity. Difficulties can arise when attempting to identify and protect against sensitive data loss when such documents include sensitive data.