Content scanning in general is a relatively well-developed area. In most applications, content scanning is keyword-based; however, more advanced applications use regular expressions or statistical methods of pattern matching/document classification. The methods themselves have been applied to many document classification problems. An example of a successful application of statistical classifiers is Spam filtering, where Bayesian classifiers demonstrate 98% correctness.
The area of Digital Asset Protection (e.g., preventing information leaks through network channels) is rather new. Commercial systems so far borrow the approaches and tools from existing areas, concentrating on off-line analysis of data for the presence of keywords. The most developed part of Digital Asset Protection is e-mail scanners, working as add-ons to e-mail delivery and exchange software. Products in this area offer keyword-based and regexp-based filtering and are focused on preventing attempts to pass offensive or other improper e-mails to the outside world, protecting a company from possible litigation.
The Digital Asset Protection area recently started to attract attention, especially because of the U.S. government's privacy initiatives such as, for example, the Gramm-Leach-Bliley Act (“GLBA”) targeted at financial institutions and the Health Insurance Portability and Accountability Act (“HIPAA”) for health care providers. Leakages of credit card numbers and medical records, for example, cost companies millions of dollars in liabilities. Accordingly, these events should be stopped.