1. Field of the Invention
The present invention refers to embedding payload in a carrier signal and extracting payload from the carrier signal, wherein the carrier signal may be an audio signal, a video signal or a multimedia signal including audio and/or video information.
2. Description of Related Art
There are various applications and/or various approaches in the art for embedding additional information in digital signals. Such concepts are known in the art under the keyword watermarking.
From WO 97/33391, a coding method for inserting an inaudible data signal into an audio signal is known. Here, the audio signal in which the inaudible data signal is to be inserted is converted to the frequency domain by means of a Fourier transform or a modified discrete cosine transform in order to determine the masking threshold of the audio signal by means of a psychoacoustic model. The data signal to be inserted into the audio signal is multiplied by a pseudo noise signal to create a frequency-spread data signal. The frequency-spread data signal is then weighted with the psychoacoustic masking threshold such that the energy of the frequency-spread data signal is always below the masking threshold. Finally, the weighted data signal is superimposed on the audio signal whereby an audio signal is produced in which the data signal is inaudibly inserted. On the one hand, the data signal may be used to determine the range of a transmitter. Alternatively, the data signal may be used for labeling audio signals to easily identify possible pirate copies, because each sound carrier, for example in form of a compact disc, is provided with an individual label at the factory. Other described possible applications for the data signal are remote controlling of audio devices in analogy to the “VPS” method in television.
EP 1149480 B1 discloses a method and a device for inserting information into an audio signal and methods and devices for determining information inserted in an audio signal. Here, the information is first processed such that the information to be inserted into the audio signal is distributed over at least two information channels. A first information channel contains copy information typically represented by a relatively small amount of data and serving to prevent illegal copying. Further information for the identification of the audio signal is inserted into a second information channel. The two channels are decodable independently of each other. A different spreading sequence is associated with each of these channels so that each channel is decodable separately from the others.
The main characteristics of such watermarking systems are the influence on the audio quality on the one hand, elevated robustness on the other hand, so that the watermark is safe against illegal interference, and further the watermark data rate. These three objectives oppose each other in that a high level of robustness, for example, implies a loss of data rate or a loss of audio quality. Furthermore, a high data rate will either cause the robustness to suffer or that the audio quality of the signal in which the audio information has been inserted suffers.
In the specialist publication “New High Data Rate Audio Watermarking based on SCS (Scalar Costa Scheme)”, S. Siebenhaar, et al., AES Convention Paper 5645, Oct. 5 to 8, 2002, Los Angeles, Calif., USA, an audio watermarking method is described in which the audio signal is first segmented and then windowed and then transformed to the frequency domain. An SCS watermark embedding is performed to then transform the result back to the time domain, subject it to windowing and then, taking block overlap into account, if necessary, finally obtain the audio signal enriched with a watermark again. The SCS algorithm consists in performing a dither quantization of the spectral value levels.
The SCS algorithm is further adapted such that properties of human hearing are taken into account to achieve a psychoacoustic weighting of the SCS algorithm.
The specialist publication “A New Surround-Stereo-Coding Technique”, W. Ten Kate, L. Van De Kerkhof and F. Zijderveld, Journal Audio Engineering Society, Vol. 40, No. 5, May 1992, pages 376 to 382, also discloses adding inaudible information to audio signals. More specifically, an audio signal is filtered by means of a filter bank and then down-sampled. More specifically, the samples in each subband are grouped into consecutive time windows. Then, the power spectrum is calculated of each block, which is then used to calculate the masking threshold. The psychoacoustic marking threshold determines the maximum allowable power of a signal to be added. This value is determined subband-wise. Subsequently, the data to be inserted are weighted using this calculated masking threshold and added to the individual subbands, whereupon an upsampling filter is used which has a downstream filter bank to finally obtain the audio signal including the embedded information.
The specialist publication “A High Rate Buried Data Chanel for Audio CD”, M. Gerzon, AES Preprint 3551, 94th AES Convention, Mar. 16 to 19, 1993, Berlin, discloses a technique for embedding a channel with a high data rate of up to 360 kBit per second or more into an audio CD without significantly affecting CD quality. The new data channel may be used to accommodate high quality data-reduced related audio channels or even to accommodate data-compressed video or computer data while, at the same time, maintaining compatibility with existing audio CD players. More specifically, a number (up to 4 per channel) of the least significant bits of the audio words are replaced by other data. Furthermore, psychoacoustic noise forming techniques associated with a noise-formed subtractive dither are used to reduce the audibility of the resulting added noise to a subjectively discernable level equal to the noise level of a conventional CD. More specifically, a pseudo random coding/decoding process is used which only operates on the LSB data stream of the audio samples without there being used additional synchronization signals to randomize the added LSB data carrying the inserted information. Due to the fact that it is based on a pseudo random sequence, this randomization may be reversed using this pseudo random sequence in the extractor.
The specialist publication “Lossless Data Hiding Based on Integer Wavelet Transform”, G. Xuan et al., IEEE Workshop, December 2002, St. Thomas, Virgin Islands, pp. 1-4, discloses a data embedding algorithm allowing a high data rate and based on an integral wavelet transform capable of recovering the original image from the image with the embedded data. The marking is further performed such that no visible interferences occur due to the inserted data. For this, the original image is subjected to an integer wavelet transform after preprocessing to obtain wavelet coefficients. The integer wavelet transform has been included into JPEG 2000. This technique is based on the application of lifting schemes. The technique is based on bits in a bit level of the wavelet coefficients being compressed so that there remains space to write data into the space cleared by the compression of the bit levels. For this, a compressed wavelet coefficient representation is generated from the original wavelet coefficients for the compression, the representation requiring less bits than the original wavelet coefficient representation, wherein the difference of the bits for the original representation and the bits of the compressed representation is used to insert data to be hidden. Then an inverse integer wavelet transform is performed to finally obtain the marked image. In particular, arithmetic coding is employed for the compression in the selected bit levels to losslessly compress binary zeros and ones.
As has been mentioned, a compromise between inaudibility, robustness and high data rate has to be sought for the methods for embedding data in audio signals. In cases where not so much robustness but rather a high data rate is required, i.e. where, for example, there is only a wire-bound transmission or the piece is passed on to a sound carrier, i.e. where no free-space transmission takes place, a compromise may be made with respect to robustness in favor of the high data rate. The same applies for applications where the embedded information is not intended for the protection against illegal copying etc. but rather, for example, for adding additional information to an audio signal which is not intended to pursue illegal distribution, but is intended to provide the consumer of the audio signal with further information and/or data as an additional service.
Furthermore, there is a need for a concept which is simple in its implementation and also modest with respect to the computing time requirements at least on the decoder side. In particular, the decoder will often be in the hands of the customers and will typically not provide particularly high computing and storage resources due to the fact that it will have to compete with respect to the price on the market.