This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
In tri-dimensional (3D) stereo video, both a right and a left view are displayed so that the user enjoys 3D effect. Left and right views are either obtained from a stereo camera capture or left and right view are synthesized from a reference view. In multi-view television, several views of the same scene, captured with different cameras are transmitted to a user. The user is free to display any of the transmitted views of the scene, or even to synthesize an intermediate view from transmitted views, which corresponds to the viewpoint of a virtual camera.
As for standard video, copyright protection remains a concern in 3D or multi-view video. Among many alternative copyrights managements systems, watermarking techniques embed imperceptible information hidden in images. This information is used in forensics to identify the source of an illegal copy. However watermark embedding and watermark detection in 3D video are more complicated than in mono view video. Indeed stereo watermarking essentially raises two technical challenges. Firstly the ability of the detector to detect embedded watermark within new types of pirate samples (e.g. single view, combined view, synthetic view) and secondly the imperceptibility of the embedded watermark with respect to depth perception.
Today, there are mainly two categories of stereo video watermarking systems adapted to 3D stereo or multi-view content.
A first category relates to depth-invariant embedding domain. When the two views of stereo video have been rectified (in the case where cameras are not parallel), pixels only shift along horizontal lines between the two views. To be oblivious to such displacements, a strategy consists in defining a domain invariant to horizontal shifts (for instance the average pixel values along rows) and to embed the watermark in this domain. Such a method is disclosed in a European patent application EP 2 426 636 A1 filed on Aug. 31, 2011 by the same applicant. The representation of the views in this invariant domain is rather stable and well-established watermarking know-how can be readily reused. This being said, the inverse mapping from the invariant domain back to a view is known to introduce possibly annoying artifacts (persistent patterns, headache, etc.).
A second category relates to disparity-coherent watermarking. It consists in exporting a reference watermark in the left and right views, based on their associated disparity information. It is somewhat equivalent to simulating a watermarked 3D scene that would be filmed. This strategy visually yields a rather natural effect: the watermark noise texture is onto the surface of the objects in the scene. On the other hand, watermark detection techniques proposed so far are non-blind. The detector requires side information (the intrinsic/extrinsic parameters of the original cameras, the intrinsic/extrinsic parameters of the synthetic camera) to retrieve the watermark. Some of these parameters may be estimated in practice but detection performances are then heavily tied to the quality of the estimation.
For instance, in “Watermarking of free-view video” (in “IEEE Transactions on Image Processing” volume 19, pages 1785-1797, July 2010), A. Koz, C. Cigla and A. Alatan disclose a method for embedding a watermark into multi views by exploiting the spatial masking properties of the human visual system. They also disclose a method for detecting watermark by exploiting the position and the rotation of a virtual camera. However the method for watermark detection requires at least one of the original views and the parameters of cameras, which are not always available. In case of unknown cameras parameters, they disclose to use the original views, along with corresponding depth-map information, to estimate the camera position and orientation of the synthesized view. The method comprises a step of transforming the original video with respect to the estimated parameters, and a step of subtracting it to the synthesized view. The correlation between the resulting signal and the watermark signal provides better performance in the watermark detection. However the estimation of the cameras parameters requires heavy processing. Such watermark detection is not blind and is complex and time consuming. Besides, the detection performances are sensitive to the cameras parameters estimation.
For instance, in “Watermarking for depth-image-based rendering” (in IEEE International Conference on Image Processing, pages 4217-4220, November 2009), E. Halici and A. Alatan also disclose a method for embedding a watermark into multi views by watermarking a reference view with a reference watermark, and embedding a projection of this reference watermark according to depth data into the other views. The method is somehow equivalent to watermarking the 3D scene shot. They also disclose a method for detecting watermark by estimating the projection matrix between the reference view and the tested view. Since it requires the reference view for watermark detection, the detector is non-blind. Once the projection matrix is estimated, the projection of the reference watermark pattern is computed. Then if the correlation between the tested view and the projected watermark pattern is high enough, the tested view is considered as generated from watermarked views. However, estimating the projection matrix is error prone and time consuming.
A large portion of watermarking systems relies on correlation-based detection. Essentially, the detector computes a correlation score between the content (e.g. a view) and a reference watermark signal. If the content contains the watermark and is aligned with it, the detection score is high and the watermark detected.
In a state of the art watermark detection method applied to video, image, or audio, (as described for instance in “Secure spread spectrum watermarking for multimedia”, IEEE trans. On Image processing, vol 6, no 12 December 1997, I. Cox, J. Killian, F. Thomson Leighton, T. Shamoon), the watermark is detected on a content by computing the correlation between the reference watermark signal and the content. Then, the absolute value of the correlation is compared to a threshold, to decide whether the content is watermarked or not with the said watermark signal. If the absolute value exceeds the threshold, then the sign of the correlation enables to determine whether bit ‘0’ or bit ‘1’ has been embedded into the content. Such a method fails to recover the watermark when the content has undergone geometric distortions, for instance if the content has been cropped, or the pixels of the content shifted. If the method is directly applied to stereo image watermarking, view synthesis will cause some pixels to shift, and the amount of shift will depend on the depth information of the corresponding objects in the picture. Thus, computing the correlation between the reference watermark signal and a synthetic view will yield a very low correlation value: only the pixels which have not been shifted by the view synthesis process will contribute positively to the correlation. This state-of-the-art method hence gives very poor detection results for 3D stereo content.
Since, it is usually more convenient to work with blind watermark detectors, a detection algorithm that could be used with any legacy watermark embedder, in particular with disparity-coherent watermark embedder, and that would not require any information related to the original video is therefore is needed.