Digital watermarking is a well researched area in the signal processing community. Many techniques been devised to hide information covertly in text and image documents. Hiding data is commonly termed “steganography” in the cryptography community. Steganography for text and image documents differs greatly since modifying pixels in an image has much less visual effect than modifying pixels in text. Therefore, existing steganography techniques for image documents are not directly applicable to text documents.
Conventional methods for data hiding in text documents include dot encoding, space modulation (line shift coding, word shift coding), luminance modulation, halftone quantization, component manipulations and syntactic methods.
Conventional methods each have their own advantages and disadvantages. For example, dot encoding has high data hiding capacity but is typically vulnerable to printing and scanning of the text document because noise is introduced and interferes with decoding the dots. On the other hand, syntactic methods are resilient to printing and scanning but have low data capacity and are not self-verifiable.
There is an increasing need to prevent unauthorized disclosure of important information in text documents, especially in this knowledge-based era. There is also a need to discourage improper information disclosure by putting a track and trace mechanism in a printed text document. In case of information leakage, the source of leakage (person who printed the document) can be identified. There is also a need for data hiding with high capacity that is resilient to printing and scanning, accommodates a wide range of text documents with little or no restrictions, and is self-verifiable.