“Watermarking” involves the encoding and decoding of information (i.e., data bits) within an analog or digital signal, such as an audio signal containing speech, music, or other auditory stimuli. A watermark encoder or modulator accepts an audio signal and a stream of information bits as input and modifies the audio signal in a manner that embeds the information into the signal while leaving the original audio content intact. The watermark decoder or demodulator accepts an audio signal containing embedded information as input (i.e., an encoded signal), and extracts the stream of information bits from the audio signal.
Watermarking has been studied extensively. Many methods exist for encoding (i.e., embedding) digital data into an audio, video or other type to signal, and generally each encoding method has a corresponding decoding method to extract the digital data from the encoded signal. Most watermarking methods can be used with different types of signals, such as audio, images, and video, for example. However, many watermarking methods target a specific signal type so as to take advantage of certain limits in human perception, and, in effect, hide the data so that a human observer cannot see or hear the data. Regardless of the signal type, the function of the watermark encoder is to embed the information bits into the input signal such that they can be reliability decoded while minimizing the perceptibility of the changes made to the input signal as part of the encoding process. Similarly, the function of the watermark decoder is to reliably extract the information bits from the watermarked signal. In the case of the decoder, performance is based on the accuracy of the extracted data compared with the data embedded by the encoder and is usually measured in terms of bit error rate (BER), packet loss, and synchronization delay. In many practical applications, the watermarked signal may suffer from noise and other forms of distortion before it reaches the decoder, which may reduce the ability of the decoder to reliably extract the data. For audio signals, the watermark decoder must be robust to distortions introduced by compression techniques, such as MP3, AAC, and AC3, which are often encountered in broadcast and storage applications. Some watermark decoders require both the watermarked signal and the original signal in order to extract the embedded data, while others, which may be referred to as blind decoding systems, do not require the original signal to extract the data.
One common method for watermarking is related to the field of spread spectrum communications. In this approach a pseudo-random or other known sequence is modulated by the encoder with the data, and the result is added to the original signal. The decoder correlates the same modulating sequence with the watermarked signal (i.e., using matched filtering) and extracts the data from the result, with the information bits typically being contained in the sign (i.e., +/−) of the correlation. This approach is conceptually simple and can be applied to almost any signal type. However, it suffers from several limitations, one of which is that the modulating sequence is typically perceived as noise when added to the original signal, which means that the level of the modulating signal must be kept below the perceptible limit if the watermark is to remain undetected. However, if the level (which may be referred to as the marking level) is too low, then the cross correlation between the original signal and the modulating sequence (particularly when combined with other noise and distortion that are added during transmission or storage) can easily overwhelm the ability of the decoder to extract the embedded data. To balance these limitations, the marking level is often kept low and the modulating sequence is made very long, resulting in a very low bit rate.
Another known watermarking method adds delayed and modulated versions of the original signal to embed the data. This effectively results in small echoes being added to the signal. The decoder calculates the autocorrelation of the signal for the same delay value(s) used by the encoder and extracts the data from the result, with the information bits being contained in the sign (i.e., +/−) of the autocorrelation. For audio signals, small echoes can be difficult to perceive and hence this technique can embed data without significantly altering the perceptual content of the original signal. However, by using echoes, the embedded data is contained in the fine structure of short time spectral magnitude and this structure can be altered significantly when the audio is passed through low bit rate compression systems such as AAC at 32 kbps. In order to overcome this limitation, larger echoes must be used, which may cause perceptible distortion of the audio.
Other watermarking systems have attempted to embed information bits by directly modifying the signal spectra. In one technique, which is described in U.S. Pat. No. 6,621,881, an audio signal is segmented and transformed into the frequency domain and, for each segment, one or two reference frequencies are selected within a preferred frequency band of 4.8 to 6.0 kHz. The spectral amplitude at each reference frequency is modified to make the amplitude a local minima or maxima depending on the data to be embedded. In a related variation, which is also described in U.S. Pat. No. 6,621,881, the relative phase angle between the two reference frequencies is modified such that the two frequency components are either in-phase (0 degrees phase difference) or out-of-phase (180 degrees phase difference) depending on the data. In either case, only a small number of frequency components are used to embed the data, which limits the amount of information that can be conveyed without causing audible degradation to the signal.
Another phase-based watermarking system, which is described in “A Phase-Based Audio Watermarking System Robust to Acoustic Path Propagation” by Arnold et. al., modifies the phase over a broad range of frequencies (0.5-11 kHz) based on a set of reference phases computed from a pseudo-random sequence that depends on the data to be embedded. As large modifications to the phase can create significant audio degradation, limits are employed that reduce the degradation but also significantly lower the amount of data that can be embedded to around 3 bps.
Many watermarking systems can be improved, in a rate-distortion sense, by using the techniques described in “Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding” by Chen and Wornell. In this approach, a multi-level constellation of allowed quantization values are assigned to represent the signal parameter (e.g., time sample, spectral magnitude, and phase) into which the data is to be embedded. These quantization values are then subdivided into two or more subsets, each of which represents a particular value of the data. In the case of binary data, two subsets are used. For each data bit, the encoder selects the best quantization value (i.e., the value closest to the original value of the parameter) from the appropriate subset and modifies the original value of the parameter to be equal to the selected value. The decoder extracts the data by measuring the same parameter in the received signal and determining which subset contains the quantization value that is closest to the measured value. One advantage of this approach is that rate and distortion can be traded off by changing the size of the constellation (i.e., the number of allowed quantization values). However, this approach must be applied to an appropriate signal parameter that can carry a high rate of information while remaining imperceptible. In one method, which is described in “MP3 Resistant Oblivious Steganography” by Gang et. al., Quantization Index Modulation (QIM) is used to encode data within the spectral phase parameters.