Watermarks
Messages are often embedded in documents as watermarks. The embedded messages can be used for security, privacy, and copyright protection.
Watermarking for paper “hard-copy” documents differs from electronic “soft-copy” documents. For soft-copy documents, all operations that involve the watermark, such as watermark insertion, document copying, document degradation, document compression, and watermark extraction are performed in the digital domain. For hard-copy documents, watermark insertion may occur digitally, but operations such as printing, faxing, photocopying involve the hardcopy document itself. Moreover, watermark extraction is performed on a scanned version of the hardcopy document. Thus, watermarks in hard-copy documents can be degraded when the documents are copied, scanned, faxed or otherwise manipulated.
Glyphs
A glyph, as defined herein, is a fundamental graphic object. The most common examples of glyphs are text characters or graphemes. Glyphs may also be ligatures, that is, compound characters, or diacritics. A glyph can also be a pictogram or ideogram. The term glyph can also be used for a non-character, or a multi-character pattern. As used herein, a glyph is some arbitrary graphic shape or object that is multi-dimensional.
Message Embedding
Method for embedding messages in signals such as images, video, and audio are known. However, embedding messages unobtrusively inside graphical objects like glyphs is difficult. Even small changes to the glyph, e.g., spacing and orientation, can easily be detected by the human visual system. Accordingly, changes to the glyphs, for the purpose of hiding messages must be extremely small and detectable at the same time. These conflicting requirements make the problem challenging.
This problem is even more difficult in the case of hard-copy watermarking. A hard-copy document can undergo physical deterioration over time. A message that would have been detectable in an electronic version of the document can be lost when the printed document is photocopied or scanned, e.g., subtle changes in gray level are lost after copying.
Conventional Message Embedding Methods
Some conventional message embedding methods treat a text document as an image and use image-based watermarking techniques. However, those methods do not work well with printers, which primarily operate on bitmapped representations of individual characters or half-tone representations of colors and shades.
Another conventional method slightly alters the color of characters such that the difference is imperceptible to the eye, but can be sensed by a scanner. Because the embedded message is invisible, it is difficult to alter the watermark. However, this method is not robust to photocopying because small differences in color or gray-level are easily lost when the document is copied.
Another method modulates a distance between individual letters, words or successive lines of text. At low embedding rates, that method is nearly invisible to a reader, and survives copying. However, at high embedding rates, the non-uniform distances between the characters, words or lines are easily visible to an attacker and also annoying to a casual reader.
Another method uses dithering to make the entire character narrower or wider than normal. However, documents produced by the method cannot easily be photocopied without destroying the message.
Another method embeds a pseudo random pattern of dots in the background of the document irrespective of the location of the text. The dots, although relatively unobtrusive, can still be easily detected by a computer and removed. Further, the dots are small and may not survive more than one instance of photocopying.
Distance Fields
The shape of a graphical object, e.g., a glyph, can be represented in a memory of a computer system as a collection of sample points in an N-dimensional space. Associated with each sample point is a smallest distance from the sample to a nearest boundary of the shape. The distances are positive or negative to indicate whether the sample is inside or outside the object, and zero when the sample is on the boundary. The collection of samples with the associated distance values is called a sampled distance field. Distance fields can also be represented as analytic procedures stored in the memory of the computer system.
As defined herein, a distance field refers to all types of distance fields, both sampled and non-sampled.
The distance field can be used to represent attributes other than the shape of glyph, such as color, gray-level, density and texture. More precisely, there is a mapping from the distance values of the distance field representing the glyph to values representing other attributes of the glyph.
Adaptively Sampled Distance Fields (ADFs)
In an adaptively sample distance field (ADF), the density of the samples depends on a level of detail required to represent different parts of the shape. For example, complicated local variations may require a large number of samples. Thus, the ADF is a representation that enables processing of arbitrary shapes, e.g., glyphs such as text characters, cartoons, and logos.