1. Field of the Invention
The present invention relates to the field of watermark processing or “watermarking” and in particular to methods and devices for embedding watermark information or for extracting watermark information, respectively, in an information signal, including audio and/or video information.
2. Description of the Related Art
Watermarks for encoded or non-encoded audio signals and video signals are meanwhile known. Using watermarks additional data may be transmitted in a robust and inaudible or almost inaudible or invisible way, respectively, within an audio signal or video signal, respectively. Different embedding methods exist depending on the format (non-encoded or encoded), which may be implemented so that they work according to the same basic principle and are therefore compatible to each other. Here, a difference is made between PCM watermarks (PCM=pulse code modulation), bit stream watermark processing and a method in which the watermark embedding takes place in combination to the encoding. Different applications for watermarks may be found in the expert's publication “Advanced Watermarking and its Applications”, C. Neubauer, J. Herre, 109. AES-Convention, Los Angeles, September 2000, preprint 5176.
In classic spread spectrum modulation which is used for conventional watermark embedding concepts, typically a BPSK (BPSK=binary phase shift keying) or QPSK (QPSK=quaternary phase shift keying) is used, i.e. a phase modulation. The QPSK is rarely used due to the necessity of a complex modulation in watermark methods. In typical spread spectrum concepts including BPSK modulation for each spread sequence with a number of L so called chips an information bit for every spread sequence or symbol, respectively, is transmitted. With regard to more details reference is made to the German Patent DE 196 40 814 C1.
In the following, a watermark extractor is described. At the output of a matched filter of the watermark extractor correlation peaks occur in the symbol pitch, whose signs, i.e. whose polarity carries the watermark information. Along the time axis within the correlator, when the signal/noise pitch is sufficiently high, correlation peaks result in regular intervals having different polarities, wherein a correlation peak with a negative polarity indicates a logic state “0” of the information bit, while correlation peaks with a positive polarity indicate a logic state “+1” of the information bits or vice versa.
The embedding of the watermark may be performed in different ways. In the example of audio signals the embedding of the watermark in uncompressed audio signals, i.e. time audio signals in the form of time-discrete consecutive samples is known. It is noted here, that the energy of watermark information is formed so that it lies below the acoustic masking threshold, so that the watermark information is not perceptible. In this context, reference is made to the expert's publication “Digital Watermarking and its Influence on Audio Quality”, C. Neubauer, J. Herre, 105. AES-Convention, San Francisco 1998, preprint 4823. The proceedings are generally that first of all a spread sequence is provided and is left in its original form when the information bit has a logic state of “+1”, or is inverted, respectively, when the information bit has a logic state of “0”. This corresponds to a BPSK modulation. The spread sequence may then be transformed into the frequency domain and may be weighted using the psychoacoustic masking threshold, i.e. so that the spectral illustration of the spread sequence has an energy course across the frequency, which corresponds to the psychoacoustic masking threshold or lies below the same. The thus weighted watermark spectrum is then again transferred into a time illustration in order to obtain a psycho-acoustically weighted time illustration of the spread sequence. In a last step, the psycho-acoustically weighted time illustration of the spread sequence is added to the time-discrete audio signal in order to obtain an audio signal with inaudible embedded watermark information. Alternatively, the audio signal may be trans-formed into the frequency domain, and the junction of the psycho-acoustically weighted spread sequence and the audio signal present in the frequency domain may be performed in the frequency domain in order to obtain the audio signal with an embedded watermark already in the frequency domain which may then after an inverse transformation into the time domain be processed further.
Alternatively, technologies for embedding watermarks in already compressed audio signals exist, as it is disclosed in the expert's publication “Audio Watermarking of MPEG-2 AAC Bit Streams”, C. Neubauer, J. Herre, 108. AES-Convention, Paris 2000, preprint 5101. The advantages of such a bit stream watermark method are on the one hand a low computational complexity, as no full decoding of the bit stream to be provided with a watermark is to be performed, wherein in particular the application of analysis and synthesis filter banks to the audio signal in the watermark embedder may be omitted. On the other hand, a high audio quality may be achieved, as the quantization noise and the watermark noise may be exactly tuned. The embedding in already compressed audio signals distinguishes itself also by a high robustness, as the watermark is not “weakened” by a subsequent audio encoder. Finally, a suitable selection of the spread band parameters enables a compatibility to the above-described PCM watermark methods.
If the audio signal is provided with a watermark already during the encoding of the same, also a low computational complexity results, as by the combination of watermark embedding and encoding certain operations, like e.g. the calculation of the masking model or the transformation of the audio signal into a spectral illustration needs to be performed only once. Also in this case a high audio quality may be guaranteed, as the quantization noise and the watermark noise may be exactly tuned. Also here a high robustness results, as the audio signal is not weakened by a subsequent encoder. Finally, also here a suitable selection of the spread band parameter allows a compatibility to the PCM watermark method. In this connection reference is made to the expert's publication “Combined Compression-Watermarking for Audio Signals”, F. Siebenhaar, C. Neubauer, J. Herre 110. AES-Convention, Amsterdam, preprint 5344.
Disadvantageous about these different methods is the fact that they only allow a relatively low data rate which is, however, sufficient for example for simple author information, which, however, when for example the classic watermarking application for author information is abandoned, may quickly be too small. However, also in classic application cases the data rate is sometimes too low, in particular in cases, in which a very high robustness is to be achieved. Applications exist, however, in which both a high data rate and a high robustness are necessary simultaneously.
For increasing the data rate, it is proposed in the expert's publication “Robust, Multi-Functional and High-Quality Audio Watermarking Technology”, M. van der Veen, et al., 110. AES-Convention, May 12th to 15th 2001, Amsterdam, Convention Paper 5345 not to use the classical PCM embedding strategy any more. For this, an audio signal is trans-formed using a discrete Fourier transformation into the frequency domain. For the watermark embedding a random sequence is used which is shifted depending on useful information which is also referred to as payload. The cyclically shifted version of this sequence is used to implement a multi bit payload with a special random sequence. Every possible shift may be associated with a payload. The random sequence shifted depending on the payload is weighted in the frequency domain and is then transformed into the time domain in order to obtain a time domain illustration of the shifted random sequence.
The watermark illustration present in the time domain is then also post-processed in the time domain in order to be finally added to the audio signal in order to obtain an audio signal having a watermark in the time domain. For a watermark detection a portion of the audio signal is segmented into frames and transformed into the frequency domain. Each frame is transformed into the frequency domain and subjected to a spectral shaping for a preprocessing before the extraction. These proceedings are performed with a plurality of segments, wherein the plurality of segments is accumulated in spectral values. The content of the accumulator is then subjected to a cross correlation with any possible shifted version of the random sequence, wherein with a certain shift a correlation peak results whose height is a measure for detection security and whose shift relative to a zero point of time includes payload information.
Problematic about the described method is the fact that the zero point of time based on which the shift is determined is generally not known a priory. If, for example, the watermark extraction is started sometime within the audio signal, it would be a pure coincidence if the segmenting raster was exactly met. In addition to that, watermarks need to be robust against attackers who perform manipulations at the audio signal having an embedded watermark to either change copying information to their advantage or to remove the watermark in an illegal way. If a watermark extractor is not able to determine the zero point of time based on the shift anymore, i.e., if the extractor looses synchronization, then it is not able anymore due to the inherent characteristics of the pulse phase modulation to extract watermark information faultlessly.