The proliferation of digitized media such as image, video and multimedia is creating a need for a security system which facilitates the identification of the source of the media. The expected introduction of digital video disks (DVD) in mass markets will further exacerbate the problem.
Content providers, i.e. owners of works in digital data form, have a need to embed signals into video/image/multimedia data, including audio data, which can subsequently be detected by software and/or hardware devices for purposes of authenticating copyright ownership, control and management.
There are systems for inserting a watermark into digital media and for extracting a watermark from watermarked media. Typical systems are described, for instance, in U.S. patent application Ser. No. 08/534,894, filed Sep. 28, 1995, entitled "Secure Spread Spectrum Watermarking for Multimedia Data", which is incorporated herein by reference, where spread spectrum watermarking by embedding a watermark signal in perceptually significant regions of an image for the purposes of identifying content owner and/or possessor. An articled entitled "Secure Spread Watermarking for Multimedia" by Cox et al available at http:www.neci.nj.nec.com/tr/index.html (Technical Report No. 95-110) describes spread spectrum watermarking which embeds a pseudo-random noise sequence into digital data for watermarking purposes.
The above watermark extraction methodology requires the original image spectrum be subtracted from the watermarked image spectrum. This restricts the use of the method when there is no original image or original image spectrum available to the decoder.
In U.S. Pat. No. 5,319,735 by R. D. Preuss et al entitled "Embedded Signaling" digital information is encoded to produce a sequence of code symbols. The sequence of code symbols is embedded in an audio signal by generating a corresponding sequence of spread spectrum code signals representing the sequence of code symbols. The frequency components of the code signal being essentially confined to a preselected signaling band lying within the bandwidth of the audio signal and successive segments of the code signal corresponds to successive code symbols in the sequence. The audio signal is continuously frequency analyzed over a frequency band encompassing the signaling band and the code signal is dynamically filtered as a function of the analysis to provide a modified code signal with frequency component levels which are, at each time instant, essentially a preselected proportion of the levels of the audio signal frequency components in corresponding frequency ranges. The modified code signal and the audio signal are combined to provide a composite audio signal in which the digital information is embedded. This component audio signal is then recorded on a recording medium or is otherwise subjected to a transmission channel. Two key elements of this process are the spectral shaping and spectral equalization that occur at the insertion and extraction stages, respectively, thereby allowing the embedded signal to be extracted without access to the unwatermarked original data.
In U.S. patent application Ser. No. 08/708,331, filed Sep. 4, 1996, entitled "A Spread Spectrum Watermark for Embedded Signaling" by Cox now U.S. Pat. No. 5,845,155, and incorporated herein by reference, there is described a method for extracting a watermark or embedded data from watermarked images or video without using an original or unwatermarked version of the data.
This method of watermarking an image or image data for embedding signals requires that the DCT (discrete cosine transform) and its inverse of the entire image be computed. There are fast algorithms for computing the DCT in N log N time, where N is the number of pixels in the image. However, for N=512.times.512, the computational requirement is still high, particularly if the encoding and extracting processes must occur at video rates, i.e. 30 frames per second. This method requires approximately 30 times the computation needed for MPEG-II decompression.
One possible way to achieve real-time video watermarking is to only watermark every N.sup.th frame. However, content owners wish to protect each and every video frame. Moreover, if it is known which frames contain embedded signals, it is simple to remove those frames with no noticeable degradation in the video signal.
An alternative option is to insert the watermark into n.times.n blocks of the image (subimages) where n&lt;&lt;N. If the block size is chosen to be 8.times.8, i.e. the same size as that used for MPEG image compression, then it is possible to tightly couple the watermark insertion and extraction procedures to those of the MPEG compression and decompression algorithms. Considerable computational saving can then be achieved since the most expensive computations relate to the calculation of the DCT and its inverse and these steps are already computed as part of the compression and decompression algorithm. The incremental cost of watermarking is then very small, typically less than five percent of the computational requirements associated with MPEG.
U.S. patent application Ser. No. 08/715,953, filed Sep. 19, 1996, entitled "Watermarking of Image Data Using MPEG/JPEG Coefficients" which is incorporated herein by reference, advances this work by using MPEG/JPEG coefficients to encode the image data.
U.S. patent application Ser. No. 08/746,022, filed Nov. 5, 1996, entitled "Digital Watermarking", now U.S. Pat. No. 5,915,027, which is incorporated herein by reference, describes storing watermark information into subimages and extracting watermark information from subimages.
A review of watermarking is found in an article by Cox et al., entitled. "A review of watermarking and the importance of perceptual modeling" in Proc. of EI'97, vol. 30-16, Feb. 9-14, 1997.
U.S. patent application Ser. No. 08/815,524, filed Mar. 12, 1997, entitled "Digital Watermarking", which is incorporated herein by reference, describes storing watermark images into the sum of all or a subset of all the subimages and extracting watermark information from subimages after the watermark from each subimage is combined together.
U.S. Provisional Patent Application Ser. No. 60/043,750, filed Apr. 9, 1997, which is incorporated herein by reference, further advances the art by providing simpler, more efficient and robust methods of inserting and detecting watermarks in digital data.
To allow for computationally efficient detection of the watermark in both the spatial and DCT domains, a watermark is inserted into sums of groups of 8.times.8 blocks in the DCT. The advantage of this approach is that, if the image is only available in the spatial domain, then the summation can also be performed in the spatial domain to compute a small number of 8.times.8 blocks and only these blocks must then be transformed into the DCT domain. This is because the sum of DCT blocks is equal to the DCT of the sum of spatial blocks. Since the computational cost of detecting watermarks is now dominated by the cost of summation, the cost of detecting in the DCT and spatial domains is approximately the same.
Although these watermarking techniques are highly successful, there is one problem remaining. Namely, even small geometric changes to an image or video frame significantly affects the DCT coefficients. For example, consider an image that is divided into 8.times.8 blocks and transformed into the frequency domain by computing the DCT of the blocks. If the image is now reduced in size by 1/8, then each original 8.times.8 pixel block is reduced to a set of 7.times.7 pixels. If the reduced or scaled image is again divided into 8.times.8 blocks and the corresponding DCTs of the blocks are calculated, these new DCT coefficients are usually very different from those of the original unscaled image. As a result, watermark detection often fails.
The present invention solves the problem of geometric scaling or affine distortion of images or video frames by approximating the scale change or distortion by a spatially varying translation of each 8.times.8 block. The image is divided into a disjoint set of blocks, each block within a set experiencing the same translation. The set of blocks is different for different sets. The watermark is extracted for all translations of the blocks and the maximum correlator output is tested for statistical significance in order to determine whether a watermark is present in the image or video frame.