1. Field of the Invention
The present invention relates generally to methods and apparatus for document or text string fingerprinting. The technology disclosed herein is applicable in various fields, including data leakage prevention, spam filtering, search engine, anti-plagiarism, data de-duplication, and so forth.
2. Description of the Background Art
One problem in the field of network security relates to data leakage prevention (DLP). DLP is needed to avoid loss of proprietary information, intellectual property, and other sensitive data. To protect sensitive data, enterprises need an effective DLP solution which monitors potential information leaks at the point of use. However, the explosion of messaging systems, wireless networking, and universal serial bus (USB) storage devices has made the protection of critical enterprise data difficult. As a result, enterprises are experiencing an increase in the loss and even theft of data assets by employees or contractors or even hackers (and malwares) who maliciously or accidentally leak data.
Another problem in the field of network security relates to unsolicited messages in e-mail systems. Such unsolicited messages, also referred to as “spam,” are mass mailed by spammers to e-mail accounts over the Internet. Various anti-spam software products have been developed to combat spam.
It is highly desirable to improve technologies which facilitate document or text string fingerprinting for data leakage prevention, spam filtering, and other applications.