Technical Field
This disclosure relates generally to identifying and managing user operations with respect to sensitive information (e.g., intellectual property, personally identifiable information, and the like).
Background of the Related Art
Data Loss Prevention (DLP) systems are well-known in the prior art and operate generally to identify, monitor use of, and to control user operations on, sensitive information within an enterprise computing environment. Typically, DLP systems provide a policy-based mechanism for managing how data is discovered and classified on a user's workstation or file server, also known as an “endpoint.” Policies must be distributed to, and enforced on, each endpoint. In an organization comprising a large number of endpoints, the task of creating and managing policies can be onerous. This is particularly true when there are a variety of endpoint configurations, which is the situation that arises due to differing file system layouts from endpoint to endpoint based on operating system and software differences, and/or differences that result from the behaviors and actions of the end users. Creating a single policy (or even multiple such policies) that accurately and efficiently identifies which areas of the file system sensitive information tends to be discovered requires significant input from, and on-going maintenance by, system administrators.
Existing DLP systems typically provide simplistic approaches to solving the issue of how to determine which areas of an endpoint file system should be examined. One brute force approach is to simply scan the entire file system. This approach suffers from large compute resource requirements and long scan times, and it may not be feasible for file systems that contain large amounts of data. For example, a full system scan over a file server with a large amount of data will occupy system resources over a long time period. Users of that file server necessarily will be affected by any impaired performance while the scan is on-going. This problem is even more acute when the scan is carried out over the entire file system but where a large percentage of the data is not actively being accessed (and thus need not be checked).
An alternative approach is to have an administrator attempt to identify a set of known safe directories in the file system that can be then excluded from the DLP policy (and thus the scan). This approach is disadvantageous in that it often requires significant overhead for administrators in managing effectively policies. In addition, this approach naively assumes that each endpoint corresponds to the known configuration, which provides a weakness that can be easily exploited.
While these approaches are valid in some cases and can produce workable systems, they impose significant constraints on the efficiency of the DLP solution.
It is desired to provide enhanced techniques for discovery of sensitive content that addresses the above-described deficiencies.