1. Field of the Invention
This invention relates to a method and apparatus for embedding data in text regions of a document in a visually imperceptible way, and to a method and apparatus for extracting such embedded data. The invention particularly concerns a data embedding technique in which stroke segments in text regions are used to embed bits of information in the document by modulating color or luminance in one region of the stroke with respect to another region. The invention also relates to programs of instructions for implementing various aspects of the embedding and extracting processes.
2. Description of the Related Art
Since facilities for reproducing documents are widely available, it has become important in many situations to be able to track document reproduction. A way that has commonly been suggested is for the copier to somehow embed information that is not readily perceptible visually but can nonetheless be recovered by machine optical scannings. One proposed approach is to add a number of low-amplitude perturbations to the original image and then correlate those perturbations with images of suspected copies. If the correlations are as expected, then the suspected document is very probably a copy. However, this approach tends to introduce an element of judgment, since it is based on varying degrees of correlation. Also, it does not lend itself well to embedding actual messages, such as copier serial numbers.
Another approach is to employ half-toning patterns. If the dither matrices employed to generate a half-toned output differ in different segments of an image, information can be gleaned from the dither-matrix selections in successive regions. But this approach is limited to documents generated by half-toning, and it works best for those produced through the use of so-called clustered-dot dither matrices, which are not always preferred.
Both of these approaches are best suited to documents, such as photographs, that consist mainly of continuous-tone images. In contrast, the vast majority of reproduced documents consist mainly of text, so workers in this field have proposed other techniques, which take advantage of such documents"" textual nature. For example, one technique embeds information by making slight variations in inter-character spacing. Such approaches lend themselves to embedding of significant amounts of information with essentially no effect on document appearance. However, such approaches are not well suited for use by photocopiers, which do not receive the related word processor output and thus may not be able to identify actual text characters reliably.
Thus, what is needed is a data embedding technique that exhibits advantages of text-based approaches in a way that is more flexible and robust than traditional approaches.
Objects of the Invention
Therefore, it is an object of the present invention to provide a technique that effectively and robustly identifies and selects sites in text regions of a document and embeds data in such sites in a visually imperceptible way.
It is another object of this invention to provide such a technique which identifies stroke segments in text as candidate sites for embedding data by differential luminance/color modulation, and to further provide a technique for extracting data so embedded.
Summary
According to one aspect of this invention, a method for embedding a message in a text-containing document is provided. The message embedding method comprises the steps of (a) obtaining a pixel representation of the document; (b) identifying text pixels of the document; (c) identifying stroke segments in the text pixels of the document; and (d) embedding information in at least one identified stroke segment by changing a characteristic value, such as a luminance or color value, of pixels in a first region of that stroke segment with respect to the characteristic value of pixels in a second region of that stroke segment, where the first and second regions are non overlapping.
Another aspect of the invention involves a method for extracting a message embedded in stroke segments in text of a document. The message extracting method comprises the steps of (a) obtaining a pixel representation of the document; (b) identifying text pixels of the document; (c) identifying the stroke segments in the text pixels of the document; and (d) measuring a value, such as an average value, representative of a characteristic, such as luminance or color, of pixels in a first region of each identified stroke segment with respect to such a value representative of the characteristic of pixels in a second region of that stroke segment to determine the presence or absence of a bit embedded in that stroke segment, where the first and second regions are non overlapping.
In other aspects of the invention, apparatuses are provided for embedding a message in a text-containing document, and for extracting a message so embedded. The message embedding/extracting apparatuses include a scanner that outputs a pixel representation of the document, and circuitry that processes the output from the scanner. Such circuitry includes a text pixel identifying circuit, a stroke segment identifying circuit, and further includes an embedding or extracting circuit. These circuits may have their processing capability hardwired therein, or they may be software controlled. The message embedding/extracting apparatuses may be embodied in a photocopier or a computer system having an input device, such as a scanner, and a processor for performing the embedding and/or extracting operations in accordance with the invention.
In accordance with further aspects of the invention, each of the above-described methods or steps thereof may be embodied in a program of instructions (e.g., software) which may be stored on, or conveyed to, a computer or other processor-controlled device for execution. Alternatively, the method(s) may be implemented using hardware or a combination of software and hardware.
Other objects and attainments together with a fuller understanding of the invention will become apparent and appreciated by referring to the following description and claims taken in conjunction with the accompanying drawings.