Audio watermarking is the process of embedding information in audio signals. To embed this information, the original audio may be changed or new components may be added to the original audio. Watermarks may include information about the audio including information about its ownership, distribution method, transmission time, performer, producer, legal status, etc. The audio signal may be modified such that the embedded watermark is imperceptible or nearly imperceptible to the listener, yet may be detected through an automated detection process.
Watermarking systems typically have two primary components: an encoder that embeds the watermark in a host audio signal, and a decoder that detects and reads the embedded watermark from an audio signal containing the watermark. The encoder embeds a watermark by altering the host audio signal. Watermark symbols may be encoded in a single frequency band or, to enhance robustness, symbols may be encoded redundantly in multiple different frequency bands. The decoder may extract the watermark from the audio signal and the information from the extracted watermark.
The watermark encoding method may take advantage of perceptual masking of the host audio signal to hide the watermark. Perceptual masking refers to a process where one sound is rendered inaudible in the presence of another sound. This enables the host audio signal to hide or mask the watermark signal during the time of the presentation of a loud tone, for example. Perceptual masking exists in both the time and frequency domains. In the time domain, sound before and after a loud sound may mask a softer sound, so called forward masking (on the order of 50 to 300 ms) and backward masking (on the order of 1 to 5 ms). Masking is a well know psychoacoustic property of the human auditory system. In the frequency domain, small sounds somewhat higher or lower in frequency than a loud sound's spectrum are also masked even when occurring at the same time. Depending on the frequency, spectral masking may cover several 100 Hz.
The watermark encoder may perform a masking analysis to measure the masking capability of the audio signal to hide a watermark. The encoder models both the temporal and spectral masking to determine the maximum amount of watermarking energy that can be injected. However, the decoder can only be successful if the signal to noise ratio (S/N) is adequate, and the peak amplitude of the watermarking is only part of that ratio. One needs to consider the noise experienced by the decoder. There are multiple noise sources but there is one noise source that can dominate: the energy in the audio program that exists at the same time and frequency of the watermarking.
The audio program both creates the masking envelop and it exists at the same time and frequency of the injected watermark. The watermark peak is determined by the masking and the watermark's noise is determined by the residual audio program. These two parameters determine the S/N. The S/N may be insufficient for the decoder to successfully extract the information.