The process of embedding data in digitised media—audio, video or images—is often referred to as digital watermarking. Unlike the paper watermarking it is named after, a key requirement is that the digital watermark should be completely imperceptible. Other requirements depend on the application:
A fragile watermark is used to show that the media has not been tampered with in any way, and should be affected whenever anything is done to the media, in particular editing of any kind.
A robust watermark is mainly used to prove ownership or copyright & should not be removable no matter what is done to the media, including compression, writing to tape, editing or any other manipulation which retains the main value of the media.
Robust watermarking uses a combination of error correction coding as for example discussed by P. Sweene, “Error Control Coding (An Introduction)”, Prentice-Hall International Ltd., Englewood Cliffs, N.J. (1991), spread-spectrum modulation see for example R. Preuss, S. Roukos, A. Higgins, H. Gish, M. Bergamo, P. Peterson, “Embedded Signalling”, U.S. Pat. No. 5,319,735, 1994, and perceptual modelling eg M. Swanson, B. Zhu, A. Tewfik, L. Boney, “Robust Audio Watermarking Using perceptual Masking, Signal Processing,” vol. 66, no. 3, May 1998, pp. 337–355, to hide the watermark data in a way that is least perceptible but still recoverable.
A problem with perceptual modelling is that compression schemes use the same model to decide which parts of the signal do not need to be reproduced in the decoded audio. Thus the very part of the signal where the data is hidden is the same part likely to be removed by compression. However, even after compression, some of the watermark tends to remain, and the robustness introduced through spread-spectrum and error coding allows it be recovered as long as the embedded data bit-rate is low.
Some known watermarking schemes substitute part of an audio signal with a watermark signal. Examples of such schemes are given in U.S. Pat. No. 5,774,452 and by J F Tilki and A A Beex in “Encoding a Hidden Digital Signature onto an Audio Signal using Psychoacoustic Masking”, (in Proc 1996, 7th Int Conf. on Sig. Proc. Apps. and Tech., pp 476–480). Because the substituted signal is quite different, they rely on psychoacoustic masking to minimise the perceptual effect of the substitution. If it were possible to substitute a signal which is perceptually equivalent to the original audio, there would be no need to rely on psychoacoustic masking, and the signal would not be in danger of being removed by compression schemes like MP3 (MPEG Audio Layer 3, as set out in “Information technology-coding of moving pictures and associated audio for digital storage media up to about 1.5 Mbit/s—Part 3. Audio”, ISO/IEC 11172-3: 1993). W Bender, D Gruhl, N Morimoto and A Lu in “Techniques for data hiding” IBM Systems Journal, Vol. 35, Nos. 3 & 4, pp 313–336, propose just such an idea for image watermarking, a technique known as Texture Block Encoding. A human selects two areas of an image where the texture is similar, and a small amount of the first area is then copied into the second area—the shape of this copied data defines the watermark and in the above referenced paper by Bender et al, is a few letters of text. The technique suffers from the need for a human to both select the areas and assess the visual impact after watermarking, and is not suitable for automated watermarking.
A number of recent audio compression techniques search for parts of the signal that can be characterised by random noise, and substitute pseudo-random noise for these parts of the signal when decoding. R C F Tucker in “Low Bit-Rate Frequency Extension Coding” (Audio and music technology: the challenge of creative DSP, IEE Colloquium, 18 Nov. 1998, pp 3/1–3/5) observes that the high frequency parts of an audio signal can successfully be replaced by spectrally-shaped noise for medium-quality compression. Scott Levine and Julius O Smith III in “A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch-Scale Modifications” (105th Audio Engineering Society Convention, San Francisco 1998) uses noise more carefully, separating out the transients from the steady-state noise and using transform coding on the transients. A more general scheme proposed by D Schultz in “Improving Audio Codecs by Noise Substitution” (JAudio Eng. Soc., Vol 44, No 178, July/August 1998, pp 593–596), the contents of which is hereby incorporated by reference, searches all time-frequency segments above 5 kHz and uses synthetic noise to reproduce only those segments which have strong noise-like properties.
We have realised that a signal portion which has an attribute which is perceived to be non-information carrying, for example noise in an audio signal, can be replaced by a signal portion which has an attribute which is also perceived as being non-information carrying but which is modulated with watermark data. In particular we have realised that it would be advantageous to substitute a portion of a signal having a substantially random attribute for a replacement signal portion which also has a substantially random attribute which has been modulated with watermark data. In one embodiment of the present invention the compression scheme suggested by D Schultz is utilised by modulating the synthetic noise with watermark data.