The present invention relates to electronic watermarking of datastreams and, in particular, to an imperceptible watermark which is inserted in the compressed domain and can be detected without a reference.
Electronic distribution of multimedia content is an important byproduct of the confluence of recent technological advances. Increasing network bandwidth, compression algorithms that preserve audio and video quality while reducing bit rate dramatically, higher density storage devices, and network search engines, when taken together, support network services which are revolutionizing the distribution of music and video.
Content owners naturally wish to maintain control over the distribution of their wares. To effectively protect their intellectual property (IP), an integrated system design is necessary [J. Lacy, D. P. Maher, and J. H. Snyder, xe2x80x9cMusic on the Internet and the Intellectual Property Protection Problemxe2x80x9d, Proc. International Symposium on Industrial Electronic, Guimaraes, Portugal, July 1997]. A typical protection system consists of three major building blocks. First, compressed content is stored in a cryptographic container before distribution to users. Second, a flexible licensing mechanism is utilized to answer questions about the trustworthiness of those seeking access to the content. Third, watermarks are embedded in the content in an imperceptible fashion in order that the content can be identified if the cryptographic container has been breached. A secure system design integrates these three components.
An electronic watermark is a data stream inserted into multimedia content. It contains information relevant to the ownership or authorized use of the content. Watermarks typically serve one of three functions: identification of the origin of the content, tracing of illegally distributed copies of the content, and disabling of unauthorized access to the content. No single marking method is best suited to all three functions, both because of complexity and because different functions and marking algorithms are resistant to different kinds of attacks. Any single piece of music or video can therefore be expected to be marked with a variety of different methods.
For copyright identification, every copy of the content can be marked identically, so the watermark needs to be inserted only once prior to distribution. Ideally, detection should not require a reference, because a search engine has no apriori way to identify the work from which it must recover the mark. The watermark particularly needs to be detectable inside an edited work in which the original content may be either shortened or abutted with other works. Not only must the watermark be short enough to be detected in a shortened version of the work, but some means must be provided to synchronize the detection process in order that the watermark can be located in the processed bitstream. Finally, a watermark used for copyright identification must be robust to further processing. Any attempt to remove it, including re-encoding the content, should lead to perceptible distortion.
Transaction identification requires a distinct mark for each transaction. The primary challenge of point-of-sale marking is to move the content through the watermarking engine quickly, meaning that the algorithm used must be of low complexity. One strategy that meets this requirement is to inert the watermark in the compressed domain. Ideally, mark insertion should increase the data rate very little. In contrast to copyright ownership marking, the transaction identification watermark must be robust to collusion attacks.
Disabling access to content is generally best performed by mechanisms other than watermarks. If a watermark is used to disable access to content, the watermark recovery mechanism should be of low complexity. It should not be used as a protection of last resort, however, as disabling access clearly indicates the location of the watermark to anyone who can reverse-engineer the access mechanism.
Watermarks used in conjunction with compression algorithms fall into one of three classes: cleartext (PCM) marking, bitstream marking, and marking integrated with the compression algorithm. Each type has advantages and disadvantages. The intended use of the watermark directly affects the choice of algorithm.
Cleartext marking relies on perceptual methods to imperceptibly embed a data stream in a signal. The model for many cleartext marking algorithms is one in which a signal is injected into a noisy communication channel, where the audio/video signal is the interfering noise [J. Smith, B. Comisky, xe2x80x9cModulation and Information Hiding in Imagesxe2x80x9d, Proc. First International Information Hiding Workshop, LNCS 1174, Springer-Verlag, Cambridge, U.K., May/June 1996, pp. 207-226]. Because the channel is so noisy and the mark signal must be imperceptible, the maximum bit rates that are achievable for audio are generally less than 100 bps.
A cleartext mark appears in all processed generations of the work, since by design the marking algorithm is both secure and robust in the face of typical processing. It is therefore well suited to identification of the work. There are two major disadvantages to cleartext marking. First, because such algorithms compute a perceptual model, they tend to be too complex for point-of-sale applications. Second, a potentially significant problem, is that these algorithms are susceptible to advances in the perceptual models used in compression algorithms. Many cleartext marking algorithms have been reported [see, e.g. Proceedings of the Fourth International Conference on Image Processing, Santa Barbara Calif., October 1997].
Retrieval mechanisms for cleartext watermarks fall into two classes: reference necessary and reference unnecessary. In either case, the mechanism for mark recovery is generally of high complexity. Furthermore, if means for detecting these watermarks are embedded in a player, an attacker, by reverse engineering the player, may be able to identify and remove the marks. Cleartext watermarks typically should not be used to gate access to content.
Bitstream marking algorithms manipulate the compressed digital bitstream without changing the semantics of the audio or video stream. For example, a data envelope in an MPEG-2 Advanced Audio Coding (AAC) [IS 13818-7 (MPEG-2 Advanced Audio Coding, AAC), M.Bosi, K. Brandenburg, S. Quackenbush, M. Dietz, J. Johnston, J. Herre, H. Fuchs, Y. Oikawa, K. Akagiri, M. Coleman, M. Iwadare, C. Lueck, U. Gbur, B. Teichmann] audio frame could contain a watermark, albeit one which could easily be removed. Bitstream marking is low-complexity, so it can be used to carry transaction information. However these marks cannot survive D/A conversion and are generally not very robust against attack; for example, they are susceptible to collusion attacks. Because the mark signal is unrelated to the media signal, the bit rate that these techniques can support can be as high as the channel rate. This type of mark can be easily extracted by clients and is thus appropriate for gating access to content.
Integrating the marking algorithm with the compression algorithm avoids an xe2x80x98arms racexe2x80x99 between marking and compression. Since the perceptual model is available from the workings of the compression algorithm, integrated marking algorithms alter the semantics of the audio or video bitstream, thereby providing resistance to collusion attacks. Depending on the details of the marking algorithm, the mark may survive D/A conversion. An example of this approach is described by F. Hartung and B. Girod in xe2x80x9cDigital Watermarking of MPEG-2 Coded Video inthe Bitstream Domainxe2x80x9d, Proc. IEEE ICASSP, pp. 2621-4, April 1997. The method of Hartung and Girod does not use perceptual techniques.
A watermark which can be recovered without a priori knowledge of the identity of the content could be used by web search mechanisms to flag unauthorized distribution of the content. Since media are compressed on these sites, a mark detection algorithm that operates in the compressed domain is useful. Accordingly, it is a primary object of the present invention to provide a robust integrated watermark that is inserted into audio or video data in the compressed domain utilizing perceptual techniques.
This invention integrates watermarking with perceptual coding mechanisms. A first generation technique is described which inserts data, typically a watermark, into an audio or video bitstream cooperatively with the compression algorithm. The data may be recovered with a simple decoding process. It is robust to attacks which modify bitstream scale factors, in the sense that damaging the mark produces perceptible artifacts. The watermarking technique of the present invention can be detected in the compressed domain without a reference, thereby avoiding a complete decode. An overall watermarking system incorporating the invention combines source (cleartext), bitstream (non-semantic altering), and integrated (semantic altering) watermarking.
In a generic perceptual coder according to the invention, the audio or video data enters the filterbank, where it is processed into multiple separate coefficients. The perceptual model module computes noise threshold information for the coefficients. The rate/distortion control module uses this information, together with bit-count information received from a noiseless coding module, to compute the scale factors to be used. For audio data, the scale factors module multiplies the coefficients received from the filterbank by the scale factors received from rate/distortion control and sends the resulting quantities to the Quantizer. For video data, the scale factors are used by the Quantizer to quantize the coefficients. For both audio and video data, the quantized coefficients from Quantizer are noiseless coded and then sent to the bitstream multiplexor. The coded data is then output from the bitstream multiplexor for further processing and transmission. The integrated marking technique of the present invention is particularly implemented by the perceptual modeling, rate/distortion control, quantization, and noiseless coding modules.
In the methods of the present invention, A={fi, Hi, {qij}} is the set of triples of scale factors fi, Huffman tables Hi, and quantized coefficients {qij}. The present invention supports three different embodiments for inserting a mark into the bitstream imperceptibly. It is assumed in these embodiments that some set of scale factor bands have been selected, into which mark data will be inserted. The specific method by which SFB are chosen for marking is not specified; however the marking set will be dynamic. M is the set of indices associated with the set of SFB chosen for marking.
In one embodiment, a set of multipliers {xi=2Ni: ixcex5M} is chosen. Each triple {fi, Hi, {qij}: ixcex5M} is modified by dividing the scale factor by xi, multiplying the quantized value {qij} by {xi}, and adding mark data {mij} to the non-zero modified quantized values. The Huffman table for the modified SFB is now the smallest codebook that accommodates the largest value qijxc3x97xi+mij. Finally, the integrally watermarked encoded source is output from the perceptual coder. Since the original scale factors were chosen perceptually, the resulting mark is imperceptible.
In an alternate embodiment, applicable only to audio, the watermark data is represented via two particular characteristics of the bitstream data. The indication that watermark data is present is that the Huffman table used to encode the SFB is not the table that would ordinarily be used. The watermark data bit is set according to any desired scheme, and the quantized coefficients are derived using the alternate Huffman table. Finally, the integrally watermarked encoded source is output from the perceptual coder.
Another embodiment is a method for watermarking which is integrated with quantization. The watermark is therefore difficult to remove without perceptible effects. The fact that marking data is present is again indicated by characteristics of the bitstream data. The watermark bit(s) are set before quantization. The modification factors {xi} are all now close to unity. The resulting Huffman table for an SFB therefore will be the original Huffman table or the next larger codebook. Because the modification to the spectral coefficients occurs before quantization, the changes to the reconstructed coefficients will be below the perceptual threshold.