1. Field of the Invention
The present invention relates to a method of encoding a digital video sequence, said digital video sequence comprising some sets of images including disparity maps, a disparity map being used to reconstruct one image of a set of images from a reference image of said set of images. The invention also relates to an apparatus for encoding, said apparatus implementing said method.
Such a method may be used in, for example, a video communication system for 3D video applications within MPEG standards.
2. Description of the Related Art
A video communication system typically comprises a transmitter with an encoder and a receiver with a decoder. Such a system receives an input digital video sequence, encodes said sequence via the encoder, transmits the encoded sequence to the receiver, then decodes the transmitted sequence via the decoder, resulting in an output digital video sequence, which is the reconstructed sequence of the input digital video sequence. The receiver then displays said output digital video sequence. A 3D digital video sequence comprises some sets of images with objects, usually one first set of texture images along with another set of images called disparity images or disparity maps. An image comprises some pixels.
Each image of the digital video signal is encoded along different general coding schemes, which have already been proposed within the scope of MPEG. For example, the MPEG2 standard referenced “Draft amendment No 3 to 13818-2 Multi-view profile—JTC1/SC29/WG11N1088” edited by ISO/IEC in November 1995 during the MPEG Meeting of Dallas (Tex.), has set the basis for the encoding of different views of a same video sequence. The main principle is not only, as in most traditional video coding schemes, to use temporal and spatial redundancies within one video sequence, but also to use redundancies between the different points of view within a video sequence, wherein each point of view is an image, a left image and a right image, respectively, captured by a left camera and a right camera, for example. As objects of a video sequence seen from two slightly different points of view do not differ very much, it is possible to predict a large part of points of view from reference points of view by virtue of prediction vectors also called disparity vectors.
Since it is always possible to have disparity vectors that are all along the same direction, it is often supposed that there are only horizontal disparity vectors. In this case, a disparity vector is defined by a single value, called disparity value. The disparity map is an image in which a disparity value is assigned to every pixel. These disparity values are encoded by the encoder and transmitted to the decoder. A reference image is also sent to the decoder, for example the left one. Said decoder will use, among other parameters, the disparity values to reconstruct the right image from the reference image.
There are various encoding schemes well known to the person skilled in the art, like DCT based, lossless run-length coding or mesh-based schemes, which can be used to encode an image. In all these encoding schemes, the disparity values are usually encoded on n-integer values, often on 8-bit data representing 256 gray levels.
One inconvenience of these encoding schemes is that, at the receiver side, one does not know exactly how to translate the disparity map of a texture image solely from these gray-level data.
Indeed, depending on a video sequence content, the disparity map of a texture image can change dramatically and hence the translation.
If the video sequence contains only objects filmed at a very close distance, disparity may need to be quite accurate, with sub-pixel accuracy. On the contrary, if the camera focuses on relatively distant objects, sub-pixel accuracy might be of no interest, whereas there might be some very large values of disparity. Finally, there might be a mixed situation, with different regions of interests within the scene and a need of non-linear varying set of values of disparity.
Therefore, because of this problem of translation of the disparity map of the prior art, at the receiver side, there is often a manual tuning of the 3D display in order to:                view correctly in 3D the reconstructed video sequence, so that a reconstructed image is equal to, or has few distortions compared to the original one, and/or        to view correctly in 3D a second 3D video sequence after a previous 3D video sequence, sent by 2 different broadcasters, for example, if these two video sequences have totally different disparity values assigned to them.        
If the manual tuning has to be done very often, it will cause discomfort for a viewer of a 3D video sequence.