Many organizations implement data loss prevention (DLP) systems to identify and control access to sensitive data. Typical DLP systems protect sensitive data through deep content inspection and analysis, which includes describing technology and fingerprinting technology. Describing technology protects sensitive data by identifying matches to keywords, expressions or patterns, and file types, and by performing other signature-based detection techniques. Fingerprinting technology protects sensitive data by identifying exact matches to whole or partial files. While effective in protecting much of an organization's sensitive data, fingerprinting and describing technologies have limitations when addressing large amounts of unstructured data and intellectual property such as product formulas, source code, and sales and marketing reports.
To more accurately protect sensitive unstructured data, some DLP systems are exploring the use of vector machine learning (VML) technology. However, VML is very complex to implement. Accordingly, current DLP systems that use VML typically require an expert in VML to design machine learning-based detection (MLD) profiles for customers. The DLP system that is shipped to the customer then has a predefined MLD profile that the customer is unable to modify. Such DLP systems do not provide any user interface or workflow to enable users to generate their own MLD profiles. However, adjusting a feature set of a user-generated MLD profile can be a challenge for both machine learning experts and non-experts.