Music is the world's universal form of communication, touching every person of every culture on the globe. Behind the melody is a growing multi-billion dollar per year industry. This industry, however, is constantly plagued by lost revenues due to music piracy.
Piracy is not a new problem. But, as technologies change and improve, there are new challenges to protecting music content from illicit copying and theft. For instance, more producers are beginning to use the Internet to distribute music content. In this form of distribution, the content merely exists as a bit stream which, if left unprotected, can be easily copied and reproduced. At the end of 1997, the International Federation of the Phonographic Industry (IFPI), the British Phonographic Industry, and the Recording Industry Association of America (RIAA) engaged in a project to survey the extent of unauthorized use of music on the Internet. The initial search indicated that at any one time there could be up to 80,000 infringing MP3 files on the Internet. The actual number of servers on the Internet hosting infringing files was estimated to 2,000 with locations in over 30 countries around the world.
Consequently, techniques for identifying copyright of digital audio content and in particular audio watermarking have received a great deal of attention in both the industrial community and the academic environment. One of the most promising audio watermarking techniques is augmentation of a copyright watermark into the audio signal itself by altering the signal's frequency spectrum such that the perceptual characteristics of the original recording are preserved. The copy detection process is performed by synchronously correlating the suspected audio clip with the watermark of the content publisher. A common pitfall for all watermarking systems that facilitate this type of data hiding is intolerance to desynchronization attacks (e.g., sample cropping, insertion, and repetition, variable pitch-scale and time-scale modifications, audio restoration, combinations of different attacks) and deficiency of adequate techniques to address this problem during the detection process.
The business model of companies that deliver products for audio copyright enforcement has been focused on satisfying the minimal set of requirements in the IFPI's and RIAA's Request for Proposals (MUSE project) for technologies that inaudibly embed data in sound recordings. More recently, the RIAA has started the Secure Digital Music Initiative (SDMI) Forum in order to establish a standard for managing audio content copyrights. The requirements in both requests do not reflect accurately the common de-synch such as.
The existing techniques for watermarking discrete audio signals facilitate the insensitivity of the human auditory system (HAS) to certain audio phenomena. It has been demonstrated that, in the temporal domain, the HAS is insensitive to small signal level changes and peaks in the pre-echo and the decaying echo spectrum. The techniques developed to facilitate the first phenomenon are typically not resilient to de-synch attacks. Due to the difficulty of the echo cancellation problem, techniques which employ multiple decaying echoes to place a peak in the signal's cepstrum can hardly be attacked in real-time, but fairly easy using an off-line exhaustive search.
Watermarking techniques that embed secret data in the frequency domain of a signal facilitate the insensitivity of the HAS to small magnitude and phase changes. In both cases, publisher's secret key is encoded as a pseudo-random sequence that is used to guide the modification of each magnitude or phase component of the frequency domain. The modifications are performed either directly or shaped according to signal's envelope. In addition, a watermarking scheme has been developed which facilitates the advantages but also suffers from the disadvantages of hiding data in both the time and frequency domain. All reported approaches perform the watermark detection process on both the audible and inaudible spectrum components, thus enabling the attacker to reduce the correlation between the watermarked signal and its watermark by adding noise in the inaudible domain. Similarly, it has not been demonstrated whether these watermarking schemes would survive combinations of common attacks: de-synch in both the temporal and frequency domain and mosaic-like attacks.
Accordingly, there is a need for a new framework of protocols for hiding and detecting watermarks in digital audio signals that are effective against desynchronization attacks. The framework should possess several attributes, including perceptual invisibility (i.e., the embedded information should not induce audible changes in the audio quality of the resulting watermarked signal) and statistical invisibility (i.e., the embedded information should be quantitatively imperceptive for any exhaustive, heuristic, or probabilistic attempt to detect or remove the watermark). Additionally, the framework should be tamperproof (i.e., an attempt to remove the watermark should damage the value of the music well above the hearing threshold) and inexpensive to license and implement on both programmable and application-specific platforms. The framework should be such that the process of proving audio content copyright both in-situ and in-court does not involve usage of the original recording.
The framework should also be flexible to enable a spectrum of protection levels, which correspond to variable audio presentation and compression standards, and yet resilient to common attacks spawned by powerful digital sound editing tools. The standard set of plausible attacks is itemized in the IFPI's and RIAA's Request for Proposals and, among others, it encapsulates the following security requirements:                Two successive D/A and A/D conversions;        Data reduction coding techniques such as MP3;        Adaptive transform coding;        Adaptive subband coding;        Digital Audio Broadcasting (DAB);        Dolby AC2 and AC3 systems;        Applying additive or multiplicative noise;        Applying a second Embedded Signal, using the same system, to a single program fragment;        Frequency response distortion corresponding to normal analogue frequency response controls such as bass, mid and treble controls, with maximum variation of 15 dB with respect to the original signal; and        Applying frequency notches with possible frequency hopping.        