Nowadays, as means for preventing secondary use of audio contents converted into digital data, such as illicit copy and modification thereof, a digital watermark technology for embedding specific information in the audio contents is widely utilized.
In the digital watermark technology, the same information (watermark information) is repeatedly embedded in a plurality of spots of a piece of audio content. Then, when detecting the watermark information, values detected from the respective spots embedding the watermark information are accumulated in a buffer, intensified together, and then subjected to processing such as error correction. Thereafter, a detection result is outputted.
As a general technique of the digital watermark technology for embedding watermark information in audio content, a technique is employed, in which a pseudo random number sequence is generated by use of data called a key, a frequency component in data of the audio contents is processed by use of this pseudo random number sequence to create a signal (watermark signal) containing desired watermark information, and the signal is added to the data of the original audio content. Then, when detecting the watermark information, a technique is employed, in which the frequency component of the data of the audio contents is processed by use of a pseudorandom number sequence generated by the same key, detected values as a result of the processing are accumulated in the buffer, then the watermark signal is extracted from the accumulated values, and an embedded message (watermark information) is decoded.
The following documents are considered:                [Patent Document 1] Japanese Patent Laid-Open No. H11 (1999)-341452        [Patent Document 2] Japanese Patent Laid-Open No. 2002-320085        
A length (time) of the accumulation of the detected values when detecting the watermark information is usually one type of fixed length. For example, a detection apparatus is designed such that 30 seconds are set as a cycle of the accumulation and the detection result of the watermark signal is outputted for every 30 seconds. Moreover, in a digital watermark technology for digital contents of a motion picture, a technique of varying the length of the accumulation of detected values when detecting watermark information is proposed (for example, refer to Patent Document 1). In this technology, watermark signals are weakly embedded so as not to deteriorate quality of the motion picture, and at the time of detection of the watermark information, the detected values are accumulated in a buffer until the detected values reach intensity sufficient for detecting the watermark information.
Moreover, in audio contents, there is one composed of a plurality of channels, such as one recorded in stereo. When the digital watermark is embedded in such audio contents, in general, one pseudo random number sequence is generated by use of one key, audio data in the respective channels is processed by use of this one pseudo random number, and thus the embedding is performed. Specifically, the same watermark signals are embedded in the audio data in the respective channels. In this case, when detecting the digital watermark, a technique is employed, in which the watermark signals are detected from the audio data in the respective channels and are synthesized, and an embedded message (watermark information) is decoded. When the digital watermarks are embedded in the respective channels, detected values from the respective channels highly correlate with one another, and accordingly, a component of the message in the detected values is intensified, thus facilitating the message to be restored. Furthermore, in the case of using the digital watermark technology for the purpose of ensuring security, a technique is proposed, in which a plurality of digital watermarks are created by use of different keys depending on features of contents and a passage of time thereof and are embedded in signals to be processed in order to enhance maintainability (for example, refer to Patent Document 2).
Meanwhile, audio contents converted into digital data are in themselves delivered through a broadcast and a network, or distributed by being recorded in a variety of recording media. In addition, audio contents are provided by being processed in various ways such as used as a piece of BGM (background music) of other contents and a jingle for a program. Hence, there are also audio contents which are extremely short in terms of time (for example, approximately two seconds), ones which are deteriorated due to superposition of another sound thereon, and the like.
Considering the existence of audio contents which are short in terms of time, it is preferable that the embedding of a digital watermark in audio contents also be performed for a short time span of the audio contents. On the other hand, in order to detect a digital watermark from audio contents, which are subjected to the superposition of another sound thereon and then deteriorated by being used as a piece of BGM and the like, it is necessary that detected values from the audio contents for a somewhat long time (for example, approximately 30 seconds) be accumulated (specifically, samples of the detected values be increased) and the watermark signal be intensified and then extracted.
However, when an accumulation cycle of the detected values is prolonged, the digital watermark embedded in the short audio contents cannot be detected. For example, even when attempting to detect a digital watermark from audio contents of approximately two seconds in the accumulation cycle set at 30 seconds, detected values, which come from sounds other than the intended audio contents, are included in the accumulated detected values for approximately 28 (=30−2) seconds. Accordingly, the message (watermark information) embedded in the audio contents cannot be correctly detected.
The above-mentioned prior art, in which the accumulation cycle of the detected values varies, has an aspect to intensify and combine the weakly embedded watermark signals by accumulating the signals until the signals reach the intensity sufficient for detecting the watermark information. In the prior art, to set an appropriate accumulation cycle for detecting the watermark information individually from the short audio contents and the deteriorated audio contents is left out of consideration.
If audio contents are a stereo-recorded audio composition or the like, the same watermark signals are embedded in the audio data in the respective channels, as mentioned above. When the digital watermark is detected, the watermark signals are detected from the audio data in the respective channels and synthesized, and then the message is restored.
However, when such audio contents are used as a piece of BGM of a narration, a sound of the narration superimposed on the audio contents has a signal analogous to a monaural one in many cases, and the correlation between the audio data of the narration in the respective channels is high. Hence, when the detected values from the respective channels are synthesized together to intensify components of the highly correlated message, components of the narration sounds are also intensified. Accordingly, it is difficult to distinguish between the message components and the noise components (narration sounds) in the detected values, thereby making it difficult to restore the message.
In order to detect the watermark signals in such a case, it is necessary to set a threshold value (a degree of correlation) for identifying the components of the watermark signals among the detected values, to a large one. However, when this threshold value is set to a large one, a much higher correlation between the watermark signals in the respective channels will be required in order to detect the digital watermark, and robustness to the deterioration of the digital watermark will be reduced.
The above-mentioned prior art which creates the watermark signals by use of the different keys in response to the features of the contents and the passage of time and embeds the created watermark signals in the signals to be processed does not consider the deterioration when a sound analogous to a monaural one, such as a narration, is superimposed on the audio contents having a plurality of channels though the prior art embeds the different watermark signals in response to the features of the contents and the passage of time. Hence, when the digital watermarks are embedded in the audio data in the respective channels, the same watermark signals using one key are likewise embedded in the audio data in the respective channels. Accordingly, the above-described problem can not be solved.