Watermarks
Watermarks are often embedded in documents as messages. The embedded messages can be used, e.g., for security, privacy, and copyright protection.
Watermarking for paper “hard-copy” documents differs from electronic “soft-copy” watermarking. For soft-copy documents, all operations such as watermark insertion, document copying, document degradation and watermark extraction occur in the digital domain, e.g., in PDF or Postscript documents. On the contrary, in the case of hard-copy documents, document degradation occurs in the hard-copy domain. Watermarks in hard-copy documents can be degraded when the documents are copied, scanned, faxed or otherwise manipulated. Hard-copy watermarks can also be physically damaged, e.g., crumpled, or torn intentionally or unintentionally.
Glyphs
A glyph, as defined herein, is a fundamental graphic object. The most common examples of glyphs are text characters or graphemes. Glyphs may also be ligatures, that is, compound characters, or diacritics. A glyph can also be a pictogram or ideogram. The term glyph can also be used for a non-character, or a multi-character pattern. As used herein, a glyph is some arbitrary graphic shape or object, which could be 2- or N-dimensional, where N is an integer larger than 2.
Message Embedding
There are a number of known methods for embedding hidden messages in media signals such as images, video, and audio. However, embedding hidden messages inside structured glyphs is a difficult problem. Even small changes to the structure, e.g., spacing and orientation, can be detected by the human visual system. Accordingly, changes to the glyphs, for the purpose of watermarking, must be very small.
This problem is even more difficult in the case of hard-copy watermarking. A hard-copy document can undergo physical deteriorations when it changes hands, is torn, or folded. A message that would have been detectable in an electronic version of the document can be lost when the printed document is photocopied or scanned, e.g., subtle changes in gray level will be lost after copying.
Conventional Message Embedding Methods
Some conventional message embedding methods treat a text document as an image and use image-based watermarking techniques. One disadvantage of these methods is that they do not work well with printers, which primarily operate on bitmapped representations of individual text characters or half-tone representations of colors and shades.
Another conventional method slightly alters the color of characters such that the difference is imperceptible to the eye, but can be sensed by a scanner. Because the embedded message is invisible, it is difficult to alter the watermark. However, the disadvantage of this method is that the small differences in color or gray-level are easily lost when the document is copied.
Another method modulates the distance between individual letters or between individual words or between successive lines of text. At low embedding rates, this method is nearly invisible to the eye, and survives copying. However, the disadvantage of this method is that at high embedding rates, the non-uniform distances between the characters, or words or lines becomes visible and annoying to a reader.
Another method employs the effect of dithering by placing a checkerboard-like black-and-white pattern of dots on the border of entire character, making the entire character narrower or wider than normal. However, this method is not robust to photocopying because the individual dot patterns would be too small to be retained after photocopying.
Another method embeds a pseudo random pattern of dots in the background of the document irrespective of the location of the text. The dots, although relatively unobtrusive, can still be easily removed. Further, the dots are small and may not survive more than one round of photocopying.
Dirty Paper Coding
Dirty Paper Coding (DPC), also referred to as “Writing on Dirty Paper” is a method of encoding a message in the presence of some side information. The side information is known to the encoder but not to the decoder. The side information generally consists of some interfering signal at the encoder. The encoder's task is to encode the desired message in such a way that the decoder must be able to recover the message without possessing any knowledge of the interfering signal. In other words, the decoder should be able to read a message from a “dirty” document without a priori knowledge of which portion constitutes the actual message and which portion is noise. Hence the name “Dirty Paper Coding.”
Distance Fields
The shape of an object, e.g., a glyph, can be represented in a memory of a computer system as a collection of sample points in an N-dimensional space. Associated with each sample point is a distance from the sample point to a boundary of the shape. The distances are positive or negative to indicate whether the sample point lies inside or outside the boundary, and zero when exactly on the boundary. The collection of sample points with the associated distance values is called a sampled distance field. Distance fields can also be represented as analytic procedures stored in the memory of the computer system.
We use the general term distance field to refer to all types of distance fields, both sampled and non-sampled.
The distance field can be used to represent attributes other than the shape of the glyph, such as color, gray-level, and texture. More precisely, there is a mapping from the distance values of the distance field representing the glyph to density values representing other attributes of the glyph.
Adaptively Sampled Distance Fields (ADFs)
In an adaptively sample distance field (ADF), the density of the sample points depends on the level of detail required to represent different parts of the shape. For example, complicated local variations may require a higher density of sample points. Thus, the ADF is a representation that enables processing of arbitrary shapes, e.g., glyphs such as text characters, cartoons, and logos.