The rapid development of computer networks and the increased use of multimedia data via the Internet have resulted in the exchange of digital information becoming faster and more convenient. However, the open environment of the Internet creates consequential problems regarding copyright of artistic works, and in particular the unlawful distribution of digital multimedia works without authorisation of the owners. To dissuade and perhaps eliminate illegal copying, a need exists for strengthening and assisting in the enforcement of copyright protection of such works.
Digital watermarking is a technique that has been applied to address this problem in respect of multimedia data, including audio, image and video data. Watermarking directly embeds copyright information into the original media and seeks to maintain the presence of the information in the media, even after manipulations are applied to the watermarked data. With respect to digital audio data, a watermark should be inaudible and robust against different attacks and collusion to defeat the watermarking. Furthermore, watermark detection should unambiguously identify the ownership and copyright. Still further, digital-watermarking technology is considered to be an integral part of several contributions to international standards, such as JPEG 2000 and MPEG 4.
Typically, watermarking is applied directly to data samples themselves, whether this be still image data, video frames or audio segments. However, such systems fail to address the issue of audio coding systems, where digital audio data is not available, but a form of representing the audio data for later reproduction according to a protocol is. It is well-known that tracks of digital audio data can require large amounts of storage and high data transfer rates, whereas synthesis-architecture coding protocols such as the Musical Instrument Digital Interface (MIDI) have corresponding requirements that are several orders of magnitude lower for the same audio data. MIDI audio files are not files made entirely of sampled audio data (i.e., actual audio sounds), but instead contain synthesiser instructions, or MIDI messages, to reproduce the audio data. The synthesiser instructions contain much smaller amounts of sampled audio data. That is, a synthesiser generates actual sounds from the instructions in a MIDI audio file. FIG. 7 is a block diagram of an example of a MIDI system 700 based on a personal computer 710. The computer 710 has a MIDI interface that can provide MIDI output 740 to a synthesiser 720. Alternatively, the synthesis may be performed using a sound card (not shown) installed in the computer 740, which may have a MIDI interface. In response to the MIDI instructions 740, the synthesiser produces audio output that can be provided to speakers 730, for example.
Expanding upon MIDI, Downloadable Sound (DLS) is a synthesiser-architecture specification that requires a hardware or software synthesiser to support its components. DLS permits additional instruments to be defined and downloaded to a synthesiser besides the standard 128 instruments provided by the MIDI system. The DLS file format stores both samples of digital sound data and articulation parameters to create at least one sound instrument. The articulation parameters include information about envelopes and loop points. For further information, reference is made to “Downloadable Sounds Level 1, Version 1.0”, The MIDI Manufacturers Association, CA, USA, 1997. Downloadable Sound is expected to become a new standard in the musical industry, because of its specific advantages. On the one hand, when compared with MIDI, DLS provides a common playback experience and an unlimited sound palette for both instruments and sound effects. On the other hand, when compared with sampled digital audio, it has true audio interactivity and, as noted hereinbefore, smaller storage requirements.
In this connection, when compared with digital video and image watermarking techniques, digital audio watermarking techniques provide a special challenge because the human auditory system (HAS) is much more sensitive than the human visual system (HVS). An ideal watermark is inaudible and robust. By inaudibility is meant that watermark makes no difference in relation to the digital audio signal in listening tests. By robustness is meant that the watermark is difficult, and ideally impossible, to remove without destroying the host audio signal. There is, however, always a conflict between inaudibility on the one hand and robustness on the other in existing audio watermarking techniques. This is further complicated by the special circumstances created by WT audio formats such as DLS, which are not complete digital audio samples, but instead contain instructions to create audio data.
Thus, a need clearly exists for improved watermark embedding and extracting systems for WT audio formats like DLS, which also effectively address the conflict between inaudibility and robustness of watermarks.