The use of video watermarking is focused on protection issues where just a few bits are enough for copyright protection of content. Nevertheless, the use of methods that embed more information opens the door for the use of the watermarking in emerging rich media applications like indexing, subtitles, hypervideo, interactive video, etc. as disclosed by Dumitru et al. at the EUROCON 2007 conference: “Gaussian Hypothesis for Video Watermarking Attacks: Drawbacks and Limitations”.
Nowadays, the protection of videos is usually achieved by hiding a single watermark in a full video such as the method disclosed in the PhD thesis of L. Coria-Mendoza “Low-Complexity methods for Image and Video Watermarking”. The watermarking methods for media protection embed little information to ensure a greater probability of detection. Increasing the capacity of information embedding involves the use of more complex watermarking methods, mainly to ensure the correct extraction of the watermark in the right order. This implies introducing a temporal synchronization phase, e.g., temporal synchronization is the process of identifying the correspondence between the temporal coordinates of the watermarked signal and the ones for watermark. Furthermore, it is important to mention that the video will be compressed, which causes the watermarking methods to be robust to video compression attacks.
As it is known, the video compression process removes temporal redundancy by using a motion compensation unit and a rate control unit. These two internal tasks are the main cause of temporal desynchronization on compressed videos. The temporal synchronization is crucial for successfully detecting watermarks. If the detector cannot be synchronized with its input video, an embedded watermark cannot be detected even though it is present in the video. The video compression methods attack the watermark hidden in each frame and desynchronize in time the detector due to the mechanism used to reduce the redundancy in the video. Thus, if two or more frames hide a single watermark (regardless of the reasons), then a temporal synchronization process is necessary to resist the video compression attack.
In order to achieve temporal synchronization and resistance to video compression, some watermarking methods, such as the one described by Zhang et al. at the ISDA 2007 conference: “A Video Watermarking Method Resistant To Synchronization Attacks Based On Statistics And Shot Segmentation”, and Kezheng et al. at the ISKE 2008 conference: “Video Watermarking Temporal Synchronization On Motion Vector”, hide information by changing slightly the magnitudes of the motion vectors of an entire scene. To perform this, motion vectors are classified into different groups, then a bit of information is embedded in each one. In this way, a bit is hidden within a group of vectors with similar characteristics. However, when using the internal task of MPEG compression as disclosed by Moving Pictures Expert Group ISO/IEC 14496-2, ISO/IEC 14496-10 and ISO 13818, (the generation of motion vectors), these methods provide robustness to this attack. This could be seen as a disadvantage because if the video is re-encoded, the watermark can be lost.
C. Chen et al. at the 2008 Congress on Image and Signal Processing, in the document “A Compressed Video Watermarking Method With Temporal Synchronization”, provides another method which hides information using other inner process carried out into the compression process like the use of the DCT coefficients (Discrete Cosine Transform), in the MPEG-4 compression. The spread spectrum is used to hide the same watermark in a subset of DCT blocks per video scene. This method depends on the internal task performed by the encoder.
The method described by Lin et al. in the documents: “Temporal synchronization in video watermarking” published in the Proceedings of the SPJE International Conference on Security and Watermarking of Multimedia Contents IV: “Temporal synchronization in video watermarking” published in IEEE Transactions on signal processing; and by Delp et al. in the document “Optical and Digital Techniques for Information Security” of 2005, hides information on the raw video, which makes it independent of the encoder. However, the method was not resistant to compression attacks.
The method above mentioned performs temporal synchronization by manipulating the embedding keys used for watermarking each frame. In this way, the watermark that is embedded in each frame carries along temporal information. The key is changed every β frames using the information of the current frame and the previous key, thereby a new key time dependent of the current and the previous frame is generated.
In U.S. Pat. No. 7,567,721 B2 a method for embedding digital watermarks in compressed video include perceptual adapting a digital watermark in predicted and non-predicted data based on block activity derived from the compressed video stream, embedding in predicted objects in a video stream having separately compressed video objects and bit rate control of watermarked video.
One of the improvements made to the original Lin's method algorithm is attributed to a paper published by Delp and Lin in 2004, “Temporal Synchronization In Video Watermarking”, the disclosure of which was incorporated in the U.S. Pat. No. 7,886,151 which describes a protocol for temporal synchronization of media signals with temporal components to be used in digital watermarking and other applications. This synchronization protocol achieves initial synchronization by finding an initial synchronization key through analysis of a temporal media signal stream. It then uses features of the stream and a queue of one or more keys from previous frames to derive subsequent keys to maintain synchronization. If synchronization is lost due to channel errors or attacks, for example, the protocol uses the initial synchronization key to re-establish synchronization. In digital watermarking applications, the synchronization protocol is agnostic to the watermark embedding and reading functions.
The U.S. Pat. No. 7,840,005 B2 relates to synchronization paradigm and applies this paradigm to different forms of synchronization, including both temporal and spatial synchronization of digital watermarks. For spatial synchronization, a paradigm in a spatial coordinate system of a media signal is applied, such as the two-dimensional spatial coordinate system of a digital still image or frame of video. The paradigm is applied to perform spatial synchronization of digital watermarks.
For performing the synchronization, the method of U.S. Pat. No. 7,840,005 B2 comprises: detecting peaks due to redundancy of features in a host media signal; wherein the redundancy is controlled via a state machine that repeats a portion of a watermark structure and varies another portion of the watermark structure over a coordinate system of the host media signal; analyzing the peaks to derive estimates of geometric or temporal distortion of the host media signal; computing a histogram of the estimated geometric or temporal distortion parameters; and from the histogram, computing a geometric or temporal distortion of the host media signal.
As we can see from the above, prior methods describe synchronizing a digital watermark by using feature extraction and a key generator. The temporal synchronization embeds a watermark within each frame that carries temporal information. The key is changed every (3 frames, which is called the local repeat. Keys are based on previous keys and information from the current frame. A finite state machine is used to calculate the keys. It is initiated by a global master key and can be reset every α frames (global repeat rate).
Since the present invention method is directed to a method comprising MPEG standards, it is quite convenient to define the type of frames included in it. MPEG standards include three kinds of frames: a) intra picture frames (I-frames); b) forward-predicted frames (P-frames); and c) bidirectional-predicted frames (B-frames). A video stream or recording will always start with an I-frame and will typically contain regular I-frames throughout the stream. These regular I-frames are crucial for the random access of recorded MPEG-4 files, such as with rewind and seek operations during playback. As is mentioned by Smart et al. in IC-COD-REP012 of 2008 “Understanding Mpeg-4 Video”, the main disadvantage of I-frames is that they tend to compress much less than P-frames or B-frames.
I-frames are coded without reference to other frames. They are coded like an image. P-frame applies motion prediction by referencing an I-frame or P-frame in front of it, motion vector points to the block in the referenced frame. B-frame applies motion prediction, referencing a frame in front of it and/or a frame behind it. Each of the two referenced frames may be I-frame or P-frame.
Macro block (MB) in video stream is represented as a 16×16 sample area. Each MB contains six 8×8 blocks, four for luminance and two for chrominance. A block of an I-frame contains six 8×8 block S, four for luminance and two for chrominance. A block of an I-frame contains simply values of luminance or chrominance of its own. A block of a P-frame or B-frame contains the difference between the values of itself and the referenced block. This process is called motion compensation. Each frame is divided into MBs. The coding process of each block includes DCT (Discrete Cosine Transform), quantization, run length encoding and entropy coding in that order. The resulting video stream comprises of entropy codes, motion vectors and control information about the structure of video and characteristics of coding.
The present invention method provides a more robust video watermarking method resistant to various types of attacks. The present invention comprises a method which is robust to MPEG-2, MPEG-4 Part 2 and MPEG-4 part 10 compression attacks. Moreover, the method provides an improved resistance against the more common temporal synchronization attacks.