The present invention relates to tuning Data Loss Prevention (DLP) signatures to improve effectiveness of the DLP sensor using the signatures. In more specific embodiments, quantification iterations are used to fine tune a set of signatures for use by the DLP sensor.
DLP systems typically examine packet and message flows within a network link. These links are typically at key points in the network, e.g., at the egress between the intranet and the internet. DLP rules describe what the systems will look for in the flows. Today, these DLP systems generate an enormous number of false positive alerts, and the tuning or alteration of the signatures is done manually with the effectiveness of the tuning is dependent on the particular skills of the tuner.
In some instances, there are hundreds or more rules, with various levels of complexity. For example a rule which produces an alert if the packet contains “this_character_string” or “this_other_charater sting” or “this_other_charater_srting_with_wild_card_characters” or “any_strings_with_structure_of_a_national_ID_number_for_set_of_countries” and not from this set of internet protocol (IP) address and the protocol is FTP or SMTP and the destination is from this set of IP ranges.
Because the rules are extremely flexible and numerous, multiple different rules could each detect a true alert for the given set of data examined, and these same different rules could produce different amounts of false positive alerts. Thus, one job of the tuner is to identify those rules from a set that will most often correctly yield a true alert, but minimize the false positives, then remove the other rules that produce greater false positive alerts.