1. Field of the Invention
This invention relates to the operation of data loss prevention systems and more specifically, to the protection of content indexing metadata using data loss prevention systems.
2. Description of the Related Art
A content indexing (CI) system may enable indexing, discovery, and/or search of data on a computer system or network. For example, a CI application may perform a background scan of one or more file systems, during which the CI system scans the contents of various files in the file system(s). After scanning the contents of a given file, the CI application may generate metadata describing the contents of the file and associate the metadata with the file. For example, the CI application may record the association in an indexing database. The particular format of the CI metadata itself may vary depending on the particular CI system implementation.
The content indexing database may be used to enable quick searches of the file system content. In order to perform the search, a search engine may consult the indexing database rather than perform the relatively time consuming and computationally expensive task of content scanning various files in the file system.
In various systems and networks, the files scanned by a CI system may contain sensitive information, such as personal and/or proprietary information. Such data may be considered sensitive from a business and/or legal standpoint. For example, some computer files may contain proprietary information that the organization does not wish to be leaked to outside parties. In other examples, various legal constraints may require that an organization track personal information on its network, such as credit card numbers and/or social security numbers. An organization may be legally required to abide by various data privacy and/or breach notification laws that require the organization to notify customers or other stakeholders when their information may have been exposed.
In order to identify, monitor, and protect sensitive data, an organization may employ a Data Loss Prevention (DLP) system. Such systems may also be known as Data Leak Prevention, Information Leak Detection and Prevention, Information Leak Prevention, Content Monitoring and Filtering, Extrusion Prevention System, among other names.
To identify a data loss risk, a DLP system may need to determine whether a given file contains sensitive data. For example, to protect data “at rest” (e.g., stored in a file system) a DLP system may scan the contents of each file in a file system, for example, by using a background scan of the files, such as is commonly done with virus scanning or content indexing. For example, a background scan may be scheduled to scan all the files in a file system for sensitive data every evening at a predetermined time, or at any other interval. A DLP system may protect data in motion (e.g., being transmitted via a network) by scanning files before they are transferred. For example, in response to detecting that a given user is attempting to email a file to an outside party, the DLP system may scan the contents of the file to determine if it contains sensitive information.
If the contents of a given file are deemed sensitive, often according to a set of configurable heuristics, the DLP system may determine that a data loss risk exists and perform any number of protective DLP actions according to one or more data loss prevention rules. For example, if during a background scan, a DLP system detects that a given file at rest contains sensitive data (e.g., social security numbers, credit card numbers, etc.), the DLP system may sequester the file according to various sequestration rules. Under various sequestration rules, sequestering the file may include encrypting the file using a given algorithm and/or key. Other sequestration rules may include storing the file or encrypted file in a safe backup storage location (i.e., a sequestration area) under certain access permissions. Access permissions to a sequestration area may be more restrictive than those to the file's original storage location.