Modern businesses and industries relay heavily on digital content as a primary mean of communication and documentation. Digital content can be easily copied and distributed (e.g., via e-mail, instant messaging, peer-to-peer networks, FTP and web-sites), which greatly increase hazards such as business espionage and data leakage. There is therefore great interest in methods that would mitigate risks of digital espionage and unauthorized dissemination of proprietary information.
In general, one can divide the counter digital espionage methods into two categories: proactive methods, that increase the difficulty of unauthorized copying and distribution of digital documents, and reactive methods, the latter providing means for detection and tracking of breached content, for forensic purposes and for tracking and incrimination of suspects, thereby to provide an effective deterrence.
Current attempts to automatically mitigate espionage are focused on proactive methods. While these methods can be helpful in some cases, it is generally believed that any proactive method may be eventually circumvented, and there is a strong need to complement these methods with reactive means, that provide for forensic evidence and a means for incrimination of suspects. An effective forensic measure should provide an effective means to determine the exact source of a breached document.
In the context of secure distribution of multimedia content, some forensic methods require that unique, personalized digital watermarks, dubbed “fingerprint”, be embedded into each copy of the data before it is sent to the final user, allowing for binding of each copy with an authorized and accountable user. Numerous methods for personalized watermarking of multimedia files, such as video and audio contents, exist: in these cases, there exists a high level of redundancy that allows embedding of watermarks into the media, in a manner that will not reduce the quality of the media and yet will be robust to both malicious and non-malicious attacks. Some methods for embedding steganograms (hidden messages) inside a text also exist, and can be traced back to far antiquity. However, since the amount of redundancy in text is much smaller then the redundancy in audio or video, it is harder to embed in a robust manner such hidden messages in a text, in particular if the embedding process is to be done automatically, and current methods for automatic embedding of steganograms in text are usually based on altering the number of spaces in the end of line, which are highly vulnerable to format changing.
In many cases, documents are prepared by groups, where each member of the group introduces his own modifications into a document. An efficient document forensic system should consider this fact, and embed modifications that are as robust as possible against casual editing while allowing for seamless group-working on copies that contain somewhat different versions of the documents.
Embedding steganograms into text is also important for copyright protection of digital books: Illegal copying and distribution of digital books, also known as “e-books”, has been prevalent in recent years, especially using the Internet. This illegal copying and distribution is an infringement of copyright protection laws and cause financial damage to the rightful owners of the content. It is therefore of great interest to find methods that would stop or at least reduce illegal copying and/or distribution of digital texts without offending rightful usage. To-date, no such method is in use.
Another important aspect of a forensic technique is robustness: a forensic method should be robust against consequential changes in the substance and preferably against deliberate attempts to remove the forensic marks. Current methods usually lack an adequate level of robustness.
Prior art regarding usage of forensic data for tracking breaches and espionage detection include the usage of manual insertion of small modifications in various copies of the document, as well as the insertion of identification data in the meta-data of the binary file and altering the number of spaces in the end of the lines of the text. Such methods do not provide an adequate solution to the problem of modern businesses, since the rate of production of copies of digital documents renders the cost of manual insertion of modifications prohibitive, and the plurality of formats in which the information can be represented render metadata based methods ineffective, since file metadata is often altered when the format of the file is changed.
There is thus a recognized need for, and it would be highly advantageous to have, a method and system that allow personalized watermarking of text in digital documents, which will overcome the drawbacks of current methods as described above.