The present invention concerns embedding information in documents in a visually imperceptible way. It particularly concerns embedding such information in document photocopies. Since facilities for reproducing documents are widely available, it has become important in many situations to be able to track document reproduction. A way that has commonly been suggested is for the copier somehow embed information that is not readily perceptible visually but can nonetheless be recovered by machine optical scannings. One proposed approach is to add a number of low-amplitude perturbations to the original image and then correlate those perturbations with images of suspected copies. If the correlations are as expected, then the suspected document is very probably a copy. But this approach tends to introduce an element of judgment, since it is based on varying degrees of correlation. Also, it does not lend itself well to embedding actual messages, such as copier serial numbers.
Another approach is to employ half-toning patterns. If the dither matrices employed to generate a half-toned output differ in different segments of an image, information can be gleaned from the dither-matrix selections in successive regions. But this approach is limited to documents generated by half-toning, and it works best for those produced through the use of so-called clustered-dot dither matrices, which are not always preferred.
Both of these approaches are best suited to documents, such as photographs, that consist mainly of continuous-tone images. In contrast, the vast majority of reproduced documents consist mainly of text, so workers in this field have proposed other techniques, which take advantage of such documents"" textual nature. For example, one technique embeds information by making slight variations in inter-character spacing. Such approaches are desirable because they lend themselves to embedding of significant amounts of information with essentially no effect on document appearance.
But we have recognized that such approaches are not well suited to use by photocopiers, which do not receive the related word-processor output and thus may not be able to identify actual text characters reliably. And we have developed a technique that exhibits text-based approaches"" advantages in a way that is more flexible than traditional approaches.
Specifically, we identify (typically, sub-character-sized) regions that consist mainly of pixels that meet certain criteria typical of text-character parts, and we embed the intended message by selectively darkening regions thus identified. Although this approach depends on the existence of such regions, it is not dependent on reliably knowing those regions actually do contain character parts. It can therefore be employed advantageously by photocopiers. Moreover, since the darkness variations in which the message is embedded occur in regions that are parts of text characters or, in any event, are very much like them, the variations can be significant from a machine point of view, but do not affect the documents"" appearance to a human reader.