1. Field of the Invention
The present invention relates to a digital watermark system, which comprises a digital watermark embedding apparatus for embedding digital watermark information in an original audio signal, and a digital watermark detection apparatus for detecting the digital watermark embedded in the original audio signal.
2. Description of the Related Art
In recent years, end users can easily perform digital recording of digital audio information (audio contents), which are provided via communication media such as digital TV broadcast, the Internet, and the like in addition to commercially available CDs (Compact Disks), DVDs (Digital Versatile Disks), and the like, and can form copies using the digitally recorded contents. Upon digital recording, since copies can be formed without any quality deterioration, problems about infringement of copyrights are serious.
As a scheme for monitoring such pirate copies, a scheme in which a provider of audio contents embeds digital watermark information which has no effect on audio quality and represents, e.g., a production number or the like in the audio contents, has been proposed.
Various schemes have been proposed as a technique for embedding digital watermark information in a digital audio signal. As typical schemes, (a) a single echo scheme and (b) PN (pseudo random noise) sequence scheme are available. The basic operations of these schemes will be explained below.
(a) Single Echo Scheme
In this single echo scheme, as shown in FIGS. 1 and 2, an echo signal 2 is inserted in an original audio signal a at a time delayed a time period (delay time period) Δ1 or Δ2 corresponding to [1] or [0] of digital watermark information b with respect to each tone signal 1 which forms this original audio signal a. Note that the actual time periods Δ1 and Δ2 are as short as several ms (milliseconds).
More specifically, as shown in a digital watermark embedding apparatus in FIG. 2, a time masking unit 3 detects the output time t0 of each tone signal 1 of the input original audio signal a. The detected output time t0 is supplied to an impulse response signal generator 4. The impulse response signal generator 4 outputs an impulse response signal c as the echo signal 2 to a convolution unit 5 at a time which is delayed the time period Δ1 or Δ2 corresponding to [1] or [0] of digital watermark information b with respect to that output time t0.
The convolution unit 5 executes a convolution process of the input original audio signal a and impulse response signal c, and outputs the convolution process result as a watermarked audio signal d shown in FIG. 1.
Although a digital watermark detection apparatus for detecting the digital watermark information b from the watermarked audio signal d generated by this digital watermark embedding apparatus is not shown, if this digital watermark detection apparatus calculates autocorrelation of this watermarked audio signal d, a peak appears at the time Δ1 or Δ2 corresponding to [1] or [0] of digital watermark information b, and the digital watermark information b embedded in the watermarked audio signal d can be detected.
When the original audio signal a is a signal which continues for a given period of time, such as music or the like, if an impulse response signal c, which approximates the entire original audio signal a to a state delayed by the time period Δ1 or Δ2 corresponding to [1] or [0] of digital watermark information b, is continuously output, the time masking unit 3 is not always required.
(b) PN (Pseudo Random Noise) Sequence Scheme
In this PN sequence scheme, as shown in FIG. 5, a PN sequence signal e [PN1 or PN0] corresponding to [1] or [0] of digital watermark information b is inserted in each tone signal 1 which forms an original audio signal a on the frequency axis.
More specifically, as shown in a digital watermark embedding apparatus in FIG. 3, a Fourier transformer 6 Fourier-transforms the input original audio signal a into a signal in the frequency axis domain, and supplies the transformed signal to a frequency masking unit 7 and adder 10. A PN sequence generator 9 outputs a PN sequence signal e [PN1 or PN0] corresponding to [1] or [0] of digital watermark information b to a multiplier 8. More specifically, 2m−1 (m; a positive integer) bit values which form a PN sequence [PN1 or PN0] are respectively added to sample values at all frequencies or at frequencies ω1, ω2, ω3, . . , ωM over a broad range.
The frequency masking unit 7 outputs frequency weighting characteristics for weighting respective frequency components of the PN sequence signal e [PN1 or PN0] to the multiplier 8 on the basis of frequency masking characteristics obtained from, e.g., the frequency distribution of an input signal in consideration of human auditory masking characteristics.
The multiplier 8 weights the PN sequence signal e [PN1 or PN0] using the frequency weighting characteristics, and outputs the weighted signal to the adder 10.
The adder 10 adds the frequency-weighted PN sequence signal e [PN1 or PN0] output from the multiplier 8 to the Fourier-transformed original audio signal a. The Fourier-transformed original audio signal a added with the PN sequence signal e [PN1 or PN0] is inversely Fourier-transformed into a time axis domain by an inverse Fourier transformer 11, and is output as a watermarked audio signal d1 shown in FIG. 5.
In a digital watermark detection apparatus, as shown in FIG. 4, the input watermarked audio signal d1 is Fourier-transformed into a signal in the frequency axis domain by a Fourier transformer 12, and the Fourier-transformed signal is input to a correlation calculation unit 13. The correlation calculation unit 13 makes a correlation operation between the Fourier-transformed watermarked audio signal d1 and a PN sequence signal e [PN1 or PN0], which is output from a PN sequence generator 14, and is the same as the PN sequence signal e used in embedding. The correlation calculation unit 13 outputs the correlation operation result as a correlation signal to a binarization unit 15. The binarization unit 15 binarizes the correlation signal to “1” or “0”, and outputs a binary value as digital watermark information b.
However, even in the aforementioned digital watermarking methods, the following problems remain unsolved.
That is, in (a) the single echo scheme, the digital watermark information b to be embedded in the original audio signal a is indicated by the time periods Δ1 and Δ2 between each tone signal 1 and echo signals 2 (impulse response signals c) inserted at temporal neighbors of the tone signal 1, as shown in FIG. 1. Therefore, it is easy for a third party to decode the digital watermark information b from the watermarked audio signal d using, e.g., an autocorrelation calculation method.
That is, since secrecy of information indicating whether or not digital watermark information b is embedded, and the embedded watermark information b cannot be assured, a malevolent third party may use such information.
Furthermore, in order to improve the detection performance of digital watermark information b, since echo signals 2 (impulse response signals c) with a relatively large level must be inserted, signal quality such as the S/N ratio of the watermarked audio signal d may impair.
In (b) the PN (pseudo random noise) sequence scheme, since digital watermark information b of [1] or [0] is embedded as the PN sequence signal e [PN1 or PN0] in the Fourier-transformed original audio signal a, secrecy of the embedded digital watermark information b can be assured. Also, since the PN sequence signal e [PN1 or PN0] is distributed over a broad range, its signal level can be lowered.
In this case, the PN sequence signal e [PN1 or PN0] is consequently distributed over the entire frequency range. However, an audio signal of music or speech is not distributed over the entire human audible frequency range and whole time band.
Therefore, in a frequency or time range in which the original audio signal a has a low level, the embedded digital watermark information b may be heard as a slight noise in the watermarked audio signal d1. Hence, the fact that the digital watermark information b is embedded is perceivable to a listener.