Since the earliest days of human civilization, music has existed at the crossroads of creativity and technology. The urge to organize sound has been a constant part of human nature, while the tools to make and capture the resulting music have evolved in parallel with human mastery of science.
Throughout the history of audio recordings, the ability to store and transmit audio (such as music) has quickly evolved since the early days just 130 years ago. From Edison's foil cylinders to contemporary technologies (such as DVD-Audio, MP3, and the Internet), the constant evolution of prerecorded audio delivery has presented both opportunity and challenge.
Music is the world's universal form of communication, touching every person of every culture on the globe. Behind the music is a growing multi-billion dollar per year industry. This industry, however, is constantly plagued by lost revenues due to music piracy.
Protecting Rights
Piracy is not a new problem. However, as technologies change and improve, there are new challenges to protecting music content from illicit copying and theft. For instance, more producers are beginning to use the Internet to distribute music content. In this form of distribution, the content merely exists as a bit stream which, if left unprotected, can be easily copied and reproduced.
At the end of 1997, the International Federation of the Phonographic Industry (IFPI), the British Phonographic Industry, and the Recording Industry Association of America (RIAA) engaged in a project to survey the extent of lo unauthorized use of music on the Internet. The initial search indicated that at any one time there could be up to 80,000 infringing MP3 files on the Internet. The actual number of servers on the Internet hosting infringing files was estimated to 2,000 with locations in over 30 countries around the world.
Each day, the wall impeding the reproduction and distribution of infringing digital audio clips (e.g., music files) gets shorter and weaker. “Napster” is an example of an application that is weakening the wall of protection. It gives individuals access to one another's MP3 files by creating a unique file-sharing system via the Internet. Thus, it encourages illegal distribution of copies of copyrighted material.
As a result, these modern digital pirates effectively rob artists and authors of music recordings of their lawful compensation. Unless technology provides for those who create music to be compensated for it, both the creative community and the musical culture at large will be impoverished.
Identifying a Copyrighted Work
Unlike tape cassettes and CDs, a digital music file has no jewel case, label, sticker, or the like on which to place the copyright notification and the identification of the author. A digital music file is a set of binary data without a detectible and unmodifiable label.
Thus, musical artists and authors are unable to inform the public that a work is protected by adhering a copyright notice to the digital music file. Furthermore, such artists and authors are unable to inform the public of any addition information, such as the identity of the copyright holder or terms of a limited license.
Digital Tags
The music industry and trade groups were especially concerned by digital recording because there is no generation loss in digital transfers—a copy sounds the same as the original. Without limits on unauthorized copying, a digital audio recording format could easily encourage the pirating of master-quality recordings.
One solution is to amend an associated digital “tag” with each audio file that identified the copyright holder. To implement such a plan, all devices capable of such digital reproduction must faithfully reproduce the amended, associated tag.
With the passage of the Audio Home Recording Act of 1992, inclusion of serial copying technology became law in the United States. This legislation mandated the inclusion of serial copying technology, such as SCMS (Serial Copy Management System), in consumer digital recorders. SCMS recognizes a “copyright flag” encoded on a prerecorded original (such as a CD), and writes that flag into the subcode of digital copies (such as a transfer from a CD to a DAT tape). The presence of the flag prevents an SCMS-equipped recorder from digitally copying the copy, thus breaking the chain of perfect digital cloning.
However, subsequent developments—both technical and legal—have demonstrated the limited benefits of this legislation. While digital secure music delivery systems (such as SCMS) are designed to support the rights of content owners in the digital domain, the problem of analog copying requires a different approach. In the digital domain, information about the copy status of a given piece of music may be carried in the subcode, which is separate information that travels along with the audio data. In the analog domain, there is no subcode; the only place to put the extra information is to hide it within the audio signal itself.
Digital Watermarks
Techniques for identifying copyright information of digital audio content that address both analog and digital copying instances have received a great deal of attention in both the industrial community and the academic environment. One of the most promising “digital labeling” techniques is augmentation of a digital watermark into the audio signal itself by altering the signal's frequency spectrum such that the perceptual characteristics of the original recording are preserved.
In general, a “digital watermark” is a pattern of bits inserted into a digital image, audio, or video file that identifies the file's copyright information (author, rights, etc.). The name comes from the faintly visible watermarks imprinted on stationery that identify the manufacturer of the stationery. The purpose of digital watermarks is to provide copyright protection for intellectual property that is in digital format.
Unlike printed watermarks, which are intended to be somewhat visible, digital watermarks are designed to be completely invisible, or in the case of audio clips, inaudible. Moreover, the actual bits representing the watermark must be scattered throughout the file in such a way that they cannot be identified and manipulated. And finally, the digital watermark must be robust enough so that it can withstand normal changes to the file, such as reductions from lossy compression algorithms.
Satisfying all these requirements is no easy feat, but there are several competing technologies. All of them work by making the watermark appear as noise—that is, random data that exists in most digital files anyway. To view a watermark, you need a special program or device (i.e., a “detector”) that knows how to extract the watermark data.
Herein, such a digital watermark may be simply called a “watermark.” Generically, it may be called an “information pattern of discrete values.” The audio signal (or clip) in which a watermark is encoded is effectively “noise” in relation to the watermark.
Watermarking
Watermarking gives content owners a way to self-identify each track of music, thus providing proof of ownership and a way to track public performances of music for purposes of royalty distribution. It may also convey instructions, which can be used by a recording or playback device, to determine whether and how the music may be distributed. Because that data can be read even after the music has been converted from digital to an analog signal, watermarking can be a powerful tool to defeat analog circumvention of copy protection.
The general concept of watermarking has been around for at least 30 years. It was used by companies (such as Muzak™) to audibly identify music delivered through their systems. Today, however, the emphasis in watermarking is on inaudible approaches. By varying signals embedded in analog audio programs, it is possible to create patterns that may be recognized by consumer electronics devices or audio circuitry in computers.
For general use in the record industry today, watermarking must be completely inaudible under all conditions. This guarantees the artistic integrity of the music. Moreover, it must be robust enough to survive all forms of attacks. To be effective, watermarks must endure processing, format conversion, and encode/detect cycles that today's music may encounter in a distribution environment that includes radio, the Web, music cassettes, and other non-linear media. In addition, it must endure malevolent attacks by digital pirates.
Watermark Encoding
Typically, existing techniques for encoding a watermark within discrete audio signals facilitate the insensitivity of the human auditory system (HAS) to certain audio phenomena. It has been demonstrated that, in the temporal domain, the HAS is insensitive to small signal level changes and peaks in the pre-echo and the decaying echo spectrum.
The techniques developed to facilitate the first phenomenon are typically not resilient to de-synch attacks. Due to the difficulty of the echo cancellation problem, techniques that employ multiple decaying echoes to place a peak in the signal's cepstrum can hardly be attacked in real-time, but fairly easy using an off-line exhaustive search. (The term “cepstrum” is the accepted terminology for the inverse Fourier transform of the logarithm of the power spectrum of a signal.)
Watermarking techniques that embed secret data in the frequency domain of a signal facilitate the insensitivity of the HAS to small magnitude and phase changes. In both cases, a publisher's secret key is encoded as a pseudo-random sequence that is used to guide the modification of each magnitude or phase component of the frequency domain. The modifications are performed either directly or shaped according to the signal's envelope.
In addition, watermarking schemes have been developed which facilitate the advantages but also suffers from the disadvantages of hiding data in both the time and frequency domain. It has not been demonstrated whether spread-spectrum watermarking schemes would survive combinations of common attacks: de-synchronization in both the temporal and frequency domain and mosaic-like attacks.
Watermark Detection
The copy detection process is performed by synchronously correlating the suspected audio clip with the watermark of the content publisher. A common pitfall for all watermarking systems that facilitate this type of data hiding is intolerance to desynchronization attacks (e.g., sample cropping, insertion, repetition, variable pitch-scale and time-scale modifications, audio restoration, and arbitrary combinations of these attacks) and deficiency of adequate techniques to address this problem during the detection process.
Desiderata of Watermarking Technology
Watermarking technology has several highly desirable goals (i.e., desiderata) to facilitate protection of copyrights of audio content publishers. Below are listed several of such goals.
Perceptual Invisibility. The embedded information should not induce audible changes in the audio quality of the resulting watermarked signal. The test of perceptual invisibility is often called the “golden ears” test.
Statistical Invisibility. The embedded information should be quantitatively imperceptive for any exhaustive, heuristic, or probabilistic attempt to detect or remove the watermark. The complexity of successfully launching such attacks should be well beyond the computation power of publicly available computer systems.
Tamperproofness. An attempt to remove the watermark should damage the value of the music well above the hearing threshold.
Cost. The system should be inexpensive to license and implement on both programmable and application-specific platforms.
Non-disclosure of the Original. The watermarking and detection protocols should be such that the process of proving audio content copyright both in-situ and in-court, does not involve usage of the original recording.
Enforceability and Flexibility. The watermarking technique should provide strong and undeniable copyright proof. Similarly, it should enable a spectrum of protection levels, which correspond to variable audio presentation and compression standards.
Resilience to Common Attacks. Public availability of powerful digital sound editing tools imposes that the watermarking and detection process is resilient to attacks spawned from such consoles. The standard set of plausible attacks is itemized in the Request for Proposals (RFP) of IFPI (International Federation of the Phonographic Industry) and RIAA (Recording Industry Association of America). The RFP encapsulates the following security requirements:                two successive D/A and A/D conversions,        data reduction coding techniques such as MP3,        adaptive transform coding (ATRAC),        adaptive subband coding,        Digital Audio Broadcasting (DAB),        Dolby AC2 and AC3 systems,        applying additive or multiplicative noise,        applying a second Embedded Signal, using the same system, to a single program fragment,        frequency response distortion corresponding to normal analogue frequency response controls such as bass, mid and treble controls, with maximum variation of 15 dB with respect to the original signal, and        applying frequency notches with possible frequency hopping.Watermark Circumvention        
If the encoding of a watermark can thwart a malicious attack, then it can avoid the harm of the introduction of unintentional noise. Therefore, any advancement in watermark technology that makes it more difficult for a malevolent attacker to assail the watermark also makes it more difficult for a watermark to be altered unintentionally.
In general, there are two common classes of malevolent attacks:                1. De-synchronization of watermark in digital audio signals. These attacks alter audio signals in such a way to make it difficult for the detector to identify the location of the encoded watermark codes.        2. Removing or altering the watermark. The attacker discovers the location of the watermark and intentionally alters the audio clip to remove or deteriorate a part of the watermark or its entirety.Framework to Thwart Attacks        
Accordingly, there is a need for a new framework of protocols for hiding and detecting watermarks in digital audio signals that are effective against malevolent attacks. The framework should possess several attributes that further the desiderata of watermark technology, described above. For example, such desiderata include “perceptual invisibility” and “statistical invisibility”. The framework should be tamperproof and inexpensive to license and implement on both programmable and application-specific platforms. The framework should be such that the process of proving audio content copyrights both in-situ and in-court does not involve usage of the original recording.
The framework should also be flexible to enable a spectrum of protection levels, which correspond to variable audio presentation and compression standards, and yet resilient to common attacks spawned by powerful digital sound editing tools.
In addition, the framework will facilitate search for the “El Dorado” and the “Holy Grail” of watermarking technology.
The seemingly unattainable “El Dorado” of watermarking technology is an encoded watermark that is unalterable, irremovable, and cannot be de-synced without perceptually and noticeably affecting the audio quality.
Likewise, the seemingly unattainable “Holy Grail” of watermarking technology is an encoded watermark where a malevolent attacker may know how the watermark is encoded, but still cannot effectively attack it without perceptually and noticeably affecting the audio quality.