The information and knowledge created and accumulated by organizations and businesses are most valuable assets. As such, managing and keeping the information and the knowledge inside the organization and restricting its distribution outside is of paramount importance for almost any organization, government entity or business and provides a significant leverage of its value. Most of the information in modern organizations and businesses is represented in a digital format. Digital content can be easily copied and distributed (e.g., via e-mail, instant messaging, peer-to-peer networks, FTP and web-sites), which greatly increase hazards such as business espionage and data leakage. It is therefore essential to monitor the information traffic in order to keep the information unavailable to unauthorized persons.
Various bills and regulations within the United States of America and other countries impose another level of importance to the problem of confidential information management and control. Regulations within the United States of America, such as the Health Insurance Portability and Accountability Act (HIPPA), the Gramm-Leach-Bliley act (GLBA) and the Sarbanes Oxley act (SOXA) implies that the information assets within organizations should be monitored and subjected to an information management policy, in order to protect clients privacy and to mitigate the risks of a potential misuse and fraud. In particular, the existence of covert channels of information, which can serves conspiracies to commit fraud or other illegal activities, pose severe risk from both legal and business perspectives.
Another aspect of the information management problem is to make the information explicitly available to authorized persons whenever needed, so that it can be utilized in order to create value for the organization. This aspect also requires tracking the information along its life cycle.
Methods that attempt to track digital information and manage information and knowledge exist. One of the most prevalent methods is based on key-words and key-phrases filtering: in this case, the system attempts to recognize a pre-defined set of previously stored information items, such as key-words, numbers and key-phrases, within the content, utilizing string comparison algorithms. Such methods are in wide usage, e.g., for email filtering utilizing string matching. However, and the usage of such methods may become prohibitively slow when the number of stored information items is large.
There is thus a recognized need for, and it would be highly advantageous to have, a method and system that allow fast and efficient recognition of large number of keywords and key phrases within electronic traffic, which will overcome the drawbacks of current methods as described above.