Audio watermarking is the process of embedding information in audio signals. To embed this information, the original audio may be changed or new components may be added to the original audio. Watermarks may include information about the audio including information about its ownership, distribution method, transmission time, performer, producer, legal status, etc. The audio signal may be modified such that the embedded watermark is imperceptible or nearly imperceptible to the listener, yet may be detected through an automated detection process.
Watermarking systems typically have two primary components: an encoder that embeds the watermark in a host audio signal, and a decoder that detects and reads the embedded watermark from an audio signal containing the watermark. The encoder embeds a watermark by altering the host audio signal. Watermark symbols may be encoded in a single frequency band or, to enhance robustness, symbols may be encoded redundantly in multiple different frequency bands. The decoder may extract the watermark from the audio signal and the information from the extracted watermark.
The watermark encoding method may take advantage of perceptual masking of the host audio signal to hide the watermark. Perceptual masking refers to a process where one sound is rendered inaudible in the presence of another sound. This enables the host audio signal to hide or mask the watermark signal during the time of the presentation of a loud tone, for example. Perceptual masking exists in both the time and frequency domains. In the time domain, sound before and after a loud sound may mask a softer sound, so called forward masking (on the order of 50 to 300 milliseconds) and backward masking (on the order of 1 to 5 milliseconds). Masking is a well know psychoacoustic property of the human auditory system. In the frequency domain, small sounds somewhat higher or lower in frequency than a loud sound's spectrum are also masked even when occurring at the same time. Depending on the frequency, spectral masking may cover several hundred hertz.
The watermark encoder may perform a masking analysis to measure the masking capability of the audio signal to hide a watermark. The encoder models both the temporal and spectral masking to determine the maximum amount of watermarking energy that can be injected. However, the encoder can only be successful if the audio signal has sufficient energy to mask the watermark. In some cases, masking energy may be limited to certain temporal and spectral regions.
Internet streaming, television audio and broadcast radio are typical examples of audio that may benefit from watermarking. While there are many possible benefits to be derived from watermarking, it has been frequently deployed as part of an audience ratings system because advertising revenue is based on the number of listeners who will be exposed to a commercial message. There are large commercial implications for the design of a watermarking technology that is as accurate as possible.
In the prior art of watermarking technology, designs assumed a generic definition of audio, which is basically any signal that is intended to be heard by human listeners in the range of 20 Hz to 20 kHz. Because the designers of such watermarking system had not made any assumptions about the properties of the audio signal to be watermarked, prior art does not consider the fact that each type of audio has its own trade-offs that strongly influence the design of the system. For example, high speed spoken speech has very different properties from easy jazz, which is very different from classical symphonies. There are probably a dozen or more types of audio, which each have very different properties and these properties will play a strong role on the watermarking system accuracy. From an ideal perspective, one could have a particular watermarking architecture for each type of audio program.