The present invention is concerned with hybrid video coding supporting intermediate view synthesis.
3D video applications such as stereo and multi-view displays, free view point video applications, etc. currently represent booming markets. For stereo and multi-view video content, the MVC Standard has been specified. Reference is made to ISO/IEC JTC1/SC29/WG11, “Text of ISO/IEC 14496-10:2008/FDAM 1 Multiview Video Coding”, Doc. N9978, Hannover, Germany, July 2008, ITU-T and ISO/IEC JTC1, “Advanced video coding for generic audiovisual services,” ITU-T Recommendation H.264 and ISO/IEC 14496-10(MPEG-4 AVC), Version 1: May 2003, Version 2: May 2004, Version 3: Mar. 2005 (including FRExt extension), Version 4: Sep. 2005, Version 5 and Version 6: June 2006, Version 7: Apr. 2007, Version 8: July 2007 (including SVC extension), Version 9: July 2009 (including MVC extension).
This standard compresses video sequences from a number of adjacent cameras. The MVC decoding process only reproduces these camera views at their original camera positions. For different multi-view displays, however, different numbers of views with different spatial positions are needed, such that additional views, e.g. between the original camera positions, are needed. Thus, in order to be suitable for all different multi-view displays, multi-view video content according to the MVC Standard would have to convey a huge amount of camera views which would, necessarily, lower the compression ratio relative to the lowest compression rate possible for multi-view displays merely exploiting a proper subset of the camera views conveyed. Other techniques for conveying multi-view data provide each sample of the frames of the camera views not only with the corresponding color value, but also a corresponding depth or disparity value based on which an intermediate view synthesizer at the decoding stage may render intermediate views by projecting and merging neighboring camera views into the intermediate view in question. Obviously, the ability to synthesize intermediate views at the decoding stage reduces the number of camera views to be conveyed via the multi-view data. Disadvantageously, however, the provision of each sample with an associated depth or disparity value increases the amount of data to be conveyed per camera view. Further, the depth or disparity data added to the color data has either to be treated like a fourth color component so as to be able to use an appropriate video codec for compressing the data, or an appropriate compression technique has to be used in order to compress the color plus depth/disparity data. The first alternative does not achieve the maximum compression rate possible since the differing statistics of the color and depth values are not considered correctly, and the latter alternative is cumbersome since a proprietary solution has to be designed, and the degree of computational load at the synthesizing side is relatively high.
In general, it would be favorable if, on one hand, the amount of multi-view data could be kept reasonably low, while on the other hand, the number of views available at the decoding side is of a reasonably high quality.