Digital watermarking is a well researched area in the signal and image processing community. Watermarks may be visible or invisible, and may conceal or otherwise contain arbitrary data. Many techniques have been devised to hide information covertly in text and image documents. Hiding data is commonly termed “steganography” in the cryptography community.
Existing techniques for steganography typically modify image pixels in an imperceptible manner. Steganography for text documents differs from image steganography, since modifying pixels in a text document can be more apparent visually than modifying pixels in an image. Also, text documents are often printed out and/or photocopied; data hidden using conventional steganography may not be retrievable from a printout or photocopy. Therefore, existing steganography techniques for image documents are not easily applicable to text documents.
Conventional methods for data hiding in text documents include dot encoding, space modulation (line shift coding and word shift coding), luminance modulation, halftone quantization, component manipulations and syntactic methods.
The conventional methods each have their own advantages and disadvantages. For example, dot encoding has high data hiding capacity but is typically vulnerable to photocopying; a photocopied document contains noise which interferes with the decodability of the dots. Further, these dots can also be intentionally disfigured or removed while leaving much of the text intact. On the other hand, syntactic methods are resilient to printing and photocopying but have low data capacity and are not typically self-verifiable.
There is an increasing need to prevent unauthorized disclosure of important information in text documents, especially in this knowledge-based era. The leakage of sensitive information, in both soft copy and paper form, is a widespread security problem. There is a need to discourage improper information disclosure by inserting a track and trace mechanism into a printed text document: traceability is a powerful security measure against document leakage, because it allows the originator of the document to be identified. A covert track and trace mechanism can be implemented effectively through the use of data hiding.
In general, there is a need for a high-capacity document data hiding method that is resilient to printing and scanning and day-to-day document handling, accommodates a wide range of text documents with few or no restrictions, and is self-verifiable.