This present invention relates to digital watermarking of digital media, and more particularly to a method and apparatus for watermark insertion and detection in a perceptually uniform transform domain.
Digital watermarks have been proposed as a means for copyright protection of digital media such as images, video, audio, and text. Digital watermarking embeds identification information directly into a digital media object by making small modifications to the object itself. A companion watermark detector can extract this xe2x80x9csignaturexe2x80x9d from the watermarked media object. The extracted signature can be used to identify the rightful owner and/or the intended recipients, as well as to verify the authenticity of the object. The signature can also be used to embed some other useful information such as copy control information and parental control information.
For most applications, two basic desirable criteria for a watermarking scheme are perceptual invisibility and robustness to intentional/unintentional attacks. The watermark should be perceptually invisible, i.e., it should not noticeably interfere with the perceivable quality of the object being protected. The watermark should also be robust to common signal processing and intentional attacks. Particularly, the watermark should still be detectable even after common signal processing operations have been applied to the watermarked image.
The dual requirements of perceptual invisibility and robustness, unfortunately, conflict with each other. That is, the former suggests that the amount of watermark energy inserted into the object should be minimized, while the latter suggests the opposite. One of the fundamental issues in digital watermarking is thus to find the best trade-off between imperceptibility and robustness to signal processing.
One way to balance perceptual invisibility and robustness is by incorporating explicit human perceptual models in the watermarking system. The perceptual models provide an upper bound on the amount of modification one can make to the media content without incurring a perceptual difference. A watermarking system can operate just within this upper bound to provide the maximal robustness to intentional or unintentional attacks, given a desired perceived quality.
For example, when a watermark is applied to a visual object intended for human viewing, the watermarking system can exploit various properties of the human visual system (HVS). That is, some researchers have attempted to hide the watermark where it will least be noticed by a human viewer. In U.S. Pat. No. 5,930,369, entitled xe2x80x9cSecure Spread Spectrum Watermarking for Multimedia Dataxe2x80x9d, Cox et al. teach a method that operates in the frequency domain, distributing the watermark within the n largest low-frequency (but not DC) transform coefficients. Cox et al. teach that the amount of watermark signal inserted into a particular coefficient can also be made proportional to the value of the coefficient itself.
Other researchers have taught the use of explicit HVS models to vary watermark energy. Podilchuk and Zeng suggest such a system in xe2x80x9cImage-adaptive watermarking using visual models,xe2x80x9d IEEE Journal on Selected Areas in Comm., special issue on Copyright and Privacy Protection, vol. 16, no. 4, pp. 525-39, May 1998. Podilchuk and Zeng make use of frequency sensitivity, luminance sensitivity and the self-masking effect of the HVS to adaptively control the amount of watermark energy to be embedded into different transform coefficients/areas of the image. They suggest incorporating perceptual models in the watermarking system by deriving a just-noticeable-difference (JND) threshold for each DCT/wavelet coefficient, and using this JND threshold to control the amount of watermark signal inserted into each coefficient.
FIG. 1 illustrates, in block diagram 20, the watermark insertion scheme proposed by Podilchuk and Zeng. A frequency-based transform (e.g., a block based discrete cosine transform (DCT) of an original image {xi,j}) produces a frequency-based representation of a digital media object {Xu,v}. JND calculator 24 uses frequency sensitivity, luminance sensitivity, and contrast masking models to compute a JND value Ju,v for each Xu,v. Watermark embedder 26 receives {Xu,v}, {Ju,v}, and a watermark sequence {wu,v}. For each component Xu,v, embedder 26 produces a corresponding output component X*u,v using the formulation:       X          u      ,      v        *    =      {                                                      X                              u                ,                v                                      +                                          J                                  u                  ,                  v                                            ⁢                              w                                  u                  ,                  v                                                                                                        if              ⁢                              xe2x80x83                            ⁢                              X                                  u                  ,                  v                                                       greater than                           J                              u                ,                v                                                                                      X                          u              ,              v                                                otherwise                    
Finally, frequency-based inverse transform 28 inverts X*u,v to produce the watermarked image x*i,j.
FIG. 2 illustrates, in block diagram 30, the watermark detection scheme proposed by Podilchuk and Zeng. The original image xi,j is input to a frequency-based transform 22 and JND calculator 24 identical to those used in FIG. 1, producing Xu,v and Ju,v as described above. The potentially-watermarked image x*i,j is input to an identical frequency-based transform 32, producing a frequency-based representation of that image X*u,v. Adder 34 subtracts Xu,v from X*u,v, producing a difference sequence w*u,v that represents a potential watermark sequence (and/or noise). Correlator 36 correlates the original watermark sequence wu,v with the difference sequence scaled by the JNDs, w*u,v/Ju,v. Comparator 38 examines a resulting correlation figure, declaring the existence of the watermark if the correlation figure exceeds a selected threshold.
In another work, the watermark embedding takes place in the spatial domain, but the perceptual model is used in the frequency domain to shape the resulting (watermarked) coefficients to make sure the modification to each coefficient does not exceed the perceptual threshold. M. Swanson et al., xe2x80x9cTransparent robust image watermarking,xe2x80x9d Proc. Inter. Conf. Image Proc., vol. 3, pp. 211-14, September 1996.
The need for watermark detection without the assistance of an original data set exists in several circumstances. First, as a content provider, an automated search for your watermarked content, e.g., over the Internet, may be practically limited to a search without the original, because the automated searcher may have no good way of determining the corresponding original for each file examined. Second, in some circumstances it may make sense to add the watermark at the time the media object is first captured or created, in which case no xe2x80x9cunxe2x80x9d-watermarked copy exists. Likewise, for security or storage efficiency, it may make sense to destroy the original copy. And when ownership is to be proven, use of an xe2x80x9coriginalxe2x80x9d object may be disallowed in order to avoid questions that can arise as to whether the xe2x80x9coriginalxe2x80x9d was possibly derived from the xe2x80x9cwatermarkedxe2x80x9d object.
In each prior art method, the watermark embedding and detection are implemented in either the spatial pixel domain or a linear transform domain. As a result, the amount of watermark energy to be embedded into each spatial pixel or linear transform coefficient varies from pixel to pixel, or from transform coefficient to transform coefficient. This, in general, makes optimal watermark detection difficult to design and implement in these domains. These problems are compounded when the original image is not available to assist in watermark detection.
Whereas the prior art has focused on ways to add the appropriate level of watermarking on a per-sample basis, the present invention takes a different approach to watermarking. The basic concept underlying this approach is watermarking/detection in a transform space that allows the same level of watermarking to be applied to all samples. For instance, in one embodiment, a watermarking system first nonlinearly transforms the original signal to a perceptually uniform domain, and then embeds the watermark in this domain without varying the statistical properties of the watermark at each sample. At the watermark detector, a candidate image is transformed to the same perceptually uniform domain, and then correlated with the original watermark sequence. Under such conditions, it is shown herein that an optimal watermark detector can be derived. This approach is particularly attractive when the original image is unavailable at the detector, as it effectively prevents the image content from biasing the watermark detection score.
In one aspect of the invention, a method of inserting a watermark into a digital media object is disclosed. Each feature from a set of features extracted from a digital data set is transformed to a corresponding perceptual domain feature. A set of pseudorandom numbers, derived from a selected watermark key, is also provided. A set of watermarked perceptual domain features is calculated, each watermarked feature based on a corresponding perceptual domain feature and pseudorandom number. Finally, the watermarked perceptual domain features are transformed out of the perceptual domain to produce a set of watermarked features.
In another aspect of the invention, a second method of inserting a watermark into a digital media object is disclosed. In this method, a self-contrast masking figure is calculated for each feature from a set of features extracted from a digital data set. Also, a neighborhood masking figure is determined for each feature based on the amplitude of features in a selected neighborhood on the data set, the location of the neighborhood bearing a relationship to the location of the feature under consideration. A set of pseudorandom numbers, derived from a selected watermark key, is also provided. A set of watermarked features is calculated, each watermarked feature combining a feature from the digital data set with a corresponding pseudorandom number, with a relative weighting based on the self-contrast and neighborhood masking figures. This second method can be practiced using the first method, e.g., by using the self-contrast and neighborhood masking figures to perceptually transform the features from the digital data set, and then combining the transformed features with the pseudorandom numbers.
In yet another aspect of the invention, a method of detecting a watermark in a digital media object is disclosed. Each feature from a set of features extracted from a digital data set is transformed to a corresponding perceptual domain feature. A set of pseudorandom numbers, derived from a selected watermark key, are also provided. A correlation figure is calculated by correlating the perceptual domain features with the pseudorandom number set.
Apparatus for implementing each of the above methods is also disclosed. One preferred implementation is as an apparatus that comprises a computer-readable medium containing computer instructions for performing one of the methods using one or more processors.
One disclosed watermark detector comprises a perceptual transform and a correlator to correlate the output of the perceptual transform with a watermark sequence. The detector may optionally calculate the watermark sequence from a selected watermark key.
One disclosed watermarking system comprises both a perceptual transform and an inverse perceptual transform. A watermarker placed between the two transforms adds a watermark signature to a data set in the perceptual domain.