An audio watermark is a type of digital watermark—a marker embedded in an audio signal. Audio watermarking is the process of embedding information in audio signals. To embed this information the original audio may be changed or new components may be added to the original audio. Watermarking applications include embedding audio sound samples with digital information about its ownership, distribution method, transmission time, performer, producer, legal status, etc.
In order to embed the digital bits that make up the identification code, watermarking modifies the original audio by adding new content or changing existing audio components. The ideal audio watermarking system is 100% reliable in terms of embedding and extracting the watermarking data in all “typical” listener scenarios while remaining 100% inaudible for all “typical” program material. These goals underscore a paradox: 100% encoding reliability likely requires audible watermarks. Conversely, to achieve total inaudibility, watermarks cannot be present at all on some material, which clearly sacrifices reliability. Trade-offs must always be made in audio watermarking systems to balance audibility and reliability.
The Portable People Meter™ (PPM™) system by The Arbitron Company is an example of a watermarking system. The Arbitron PPM system embeds watermarks with station identification codes into the audio program at the time of broadcast using an encoder in each individual radio station's transmission chain. Portable PPM decoders then identify which stations the wearers of the decoders or “people meters” are listening to.
A watermarking technology that is used to track listeners of radio programs such as PPM is more likely to need close to 100% reliability of data extraction even if some audio is broadcasted with modest perceptible degradation. The reason for requiring 100% reliability is that failures in reliability are not uniformly spread across the broadcast population. For example, a system that is 99% reliable over all announcers, program types, and listening devices, may have the 1% of failures concentrated in a particular radio announcer or a particular radio show or type of music from, for example, a particular cultural tradition. Listener ratings for the particular radio announcer, the particular radio show or type of music would drop, resulting in a loss of advertising revenue and the eventual cancellation of the affected programming. Clearly, large amounts of money are at stake on reliability.
Therefore, ensuring that audio leaving the station is optimized for successful watermarking encoding/decoding is important. There is a need for a system that individual radio broadcasters, the originators of the terrestrial signal, can utilize to control the trade-off between higher reliability of watermark decoding and higher audible degradation.
A first step towards more control of these trade-offs may be to extract the watermark signal from the output of the encoder such that analysis may be conducted to better understand the effects of watermarking and perhaps control them to the broadcaster's benefit.
One potential approach to extracting the watermark signal would be to attempt to simply subtract the input of the watermarking encoder from its output to obtain the watermark signal. This approach, however, is ineffective because the watermarking encoder introduces changes between the input and output signals that make simple subtraction inaccurate to the point that it is useless.
An approach for compensating for the changes through the encoder to allow for accurate subtraction may be based on a class of technology called adaptive filters. This technology iteratively finds the coefficients of the optimum filter that minimizes the difference between a) the input to the encoder as compensated by the filter and b) the actual encoder output. This approach, however, is also ineffective for several reasons. First, the encoding process involves more than just a change in gain and delay because it also adds the watermarking signal which is unknown and time-varying over a potentially large part of the spectrum. A filter cannot fully compensate for these changes. Second, the convergence of the adaptive filter to an optimum depends very strongly on the spectrum of the input signal, which is also unknown and rapidly changing. As a result, the optimization may produce only small errors between input and output, but small components at some frequencies may be more important than larger components at other frequencies. Therefore, adaptive filters, which are well known in the art, would not solve the problem.
A more nuanced approach would be to understand and compensate for the internals of the watermarking encoder to account for the changes between the input and output signals. This approach, however, is impractical at least because a) the internals of the watermarking encoders are not well understood by people other than the manufacturers of the encoders and, perhaps more importantly, b) a watermark extracting system should ideally be able to extract the watermark independently of the internals of any particular implementation of watermarking by a particular encoder.
Even if the watermark could be successfully extracted, conventionally there was no way to control the trade-off between higher reliability of watermark decoding and higher audible degradation. Moreover, conventionally there was no way to account for degradation of the watermarked signal caused in the “real world” by the listener's environment when determining the proper trade-off.