The prospect of sensitive data being leaked outside of a network remains a fundamental security problem. Despite several technologies such as firewalls, intrusion detection systems and intrusion prevention systems designed to prevent unauthorized data from entering a network, data continues to regularly leak out of seemingly secure networks. For example, intrusions designed for disseminating sensitive data beyond a network boundary can occur due to “zero-day” attacks or security compromises at the application level. Sensitive data may also leak out of a network inadvertently, such as due to configuration errors or other mistakes made by humans having access to the sensitive data.
In one current method for data leakage prevention (DLP), keywords or regular expressions are installed via software at network boundaries to inspect outgoing documents (based on these keywords or regular expressions), and any documents that contain a match of these keywords or regular expressions can be prevented from leaving the network. This approach can have a high false-positive rate, however, as a significant number of non-sensitive documents can be flagged as sensitive. Thus, users may be wrongly prevented from communicating non-sensitive documents.