Digital watermarking consists of embedding hidden data (known as watermark) in a digital object such as audio, video, images and text. This technique allows transmitting supplementary content-related information in a manner that is imperceptible to the user of the digital object, and can be applied to a wide variety of applications, such as broadcast monitoring, owner identification, proof of ownership, transaction tracking, content authentication (with or without tampering localization), copy control, device control and legacy enhancement.
In order to implement a digital watermarking method, both an embedding system and an extraction system are required. The embedding system is implemented in the transmitting end, and uses the digital content and the watermark as inputs in order to generate the watermarked content, that is, a modified digital file with the watermark embedded in it. The extraction system is implemented in the receiving in end, and is responsible for receiving the watermarked content and extracting the embedded watermark. A common watermark key may be used by both ends in order to protect the watermark. Additionally, encryption and encryption keys can be used for increasing the security of the embedded watermark.
In the particular case of audio watermarking, the watermark data is embedded in the audio content of an audio or video digital file, using either the time or the frequency domains for data embedding. In frequency domain audio watermarking, an original audio signal undergoes a frequency transform such as a Discrete Fourier Transform (DFT), Modified Discrete Cosine Transform (MDCT) or Wavelet Transform (WT). The bits from the watermark are embedded by replacing a plurality of the resulting transform coefficients with modified coefficients which codify said bits. One of the alternatives for frequency domain audio watermarking is to codify the watermark in the coefficients of a Fast Fourier Transform (FFT), as shown in “High capacity FFT-based audio watermarking” (M. Fallahpour and D. Megias, Eds. B. de Decker et al., Communications and Multimedia Security, Lecture Notes in computer Science Volume 7025, pages 235-237, 2011). This approach takes advantage of the translation-invariant property of FFT coefficients to resist small distortions in the time domain. It therefore provides a high degree of robustness against common signal processing such as noise, filtering and compression, while also enabling a high capacity with no great perceptual distortion. However, these techniques are aimed towards all-digital systems in which the watermarked audio is digitally transmitted to the receiving end through a communication network without large distortions. The watermark cannot therefore be transmitted to a nearby device which is in proximity of a source playing the watermarked audio content, but does not have access to the original watermarked audio digital file. In this scenario, the spectrum of the watermarked audio may be distorted and shifted, hindering the decoding of the embedded data. Furthermore, as the receiving end is not notified of the start of a particular file within a continuous audio transmission, a conventional watermark extraction system is not capable of determining when a watermark is being transmitted.
The aforementioned limitations are also present, for example, in the following systems known in the state of the art. US 2012/300971 A1 discloses a system in which the watermark is segmented and embedded into multiple channels of audio and video. WO 2013/0179666 A1 provides an approach which minimizes distortion to the listener by only embedding data in some particular sections of the audio signal. US 2004/0257977 A1 also aims to minimize distortion to the listener by embedding watermark data in selected positions of an audio signal. In the proximity of the selected positions, data embedding is performed by means of multiplying the discrete Fourier Transform coefficients of the audio signal with values encoding the watermark data. EP 2562749 discloses a system which sorts the audio file into blocks or sections according to whether they are susceptible of being watermarked. Nevertheless, all these watermark extraction systems operate directly on the digital audio signal after being transmitted through a digital communication network without major distortions, and hence cannot be applied to a scenario in which a watermarked audio file is transmitted through sound waves.
All approaches known in the state of the art therefore fail to provide a robust an efficient audio watermarking solution for environments in which the audio signal is transmitted by means of sound waves through a medium with interferences or signal degradations. Their embedding and extraction techniques are also not adapted to lightweight devices with limited processing capabilities. There is hence the need of a method and apparatus capable of embedding and extracting watermark data into an audio signal, where the extraction is performed after the audio signal is transmitted through the air as sound waves and captured by a user device, with the subsequent signal degradation.