Multi-view video coding (“MVC”) is a key technology for interactive multimedia applications such as free-viewpoint video (“FVV”) or free-viewpoint television (“FTV”), 3D television (“3DTV”), immersive teleconference, surveillance, and so on. A multi-view video is typically captured by multiple cameras from different angles and locations at the same time. For example, a multi-view video of a baseball game may be generated by three cameras: one located behind home plate, one located near first base, and one located near third base. Because of the vast amounts of data needed to represent a multi-view video, it is important have compression techniques that allow for efficient storage and transmission of a multi-view video.
The multi-view video coding techniques for generic multi-view videos typically use a traditional block-based hybrid video coding. Several standards have been proposed that provide a framework for multi-view video coding (e.g., MPEG-2 multi-view profile (“MVP”) and MPEG-4 multiple auxiliary components (“MAC”)). Based on the framework provided by these standards, some standard-compatible MVC schemes have been proposed. (See, Puri, R. V. Kollarits and B. G. Haskell, “Basics of stereoscopic video, new compression results with MPEG-2 and a proposal for MPEG-4,” Signal Processing: Image Communication, vol. 10, pp. 201-234, 1997; J. Lim, K. Ngan, W. Yang, and K. Sohn, “Multiview sequence CODEC with view scalability,” Signal Processing: Image Communication, vol. 19, no. 3, pp. 239-256, March 2004; and W. Yang and K. Ngan, “MPEG-4 based stereoscopic video sequences encoder,” in Proc. ICASSP 2004, vol. 3, pp. 741-744, May 2004.) To further improve coding efficiency, other techniques have been proposed that extend the syntaxes or semantics of these standards. (See, Y. Choi, S. Cho, J. Lee, and C. Ahn, “Field-based stereoscopic video codec for multiple display methods,” in Proc. ICIP 2002, vol. 2, pp. 253-256, NY, USA, September 2002; X. Guo, and Q. Huang, “Multiview video coding based on global motion model,” Lecture Notes in Computer Sciences, vol. 3333, pp. 665-672, December 2004; Li, Y. He, “A novel multiview video coding scheme based on H.264,” in Proc. ICICS-PCM 2003, pp. 493-497, Singapore, December 2003; and ISO/IEC JTC1/SC29WG11 M11700, “Responses received to CfE on multi-view video coding,” Hong Kong, China, January 2005.) In particular, some MPEG-4 AVC/H.264-based MVC techniques have been proposed. (See, X. Guo and Q. Huang, “Multiview video coding based on global motion model,” Lecture Notes in Computer Sciences, vol. 3333, pp. 665-672, December 2004; and Li, Y. He, “A novel multiview video coding scheme based on H.264,” in Proc. ICICS-PCM 2003, pp. 493-497, Singapore, December 2003.)
Most of the multi-view video coding techniques are based on the traditional hybrid video coding and have employed multi-hypothesis prediction, in which either temporal or view correlation is utilized in terms of the coding cost. The performance of the multi-view video coding techniques is usually evaluated by comparing them with the simulcast coding in which each view is independently coded. The existing multi-view video coding techniques have not shown a significant or consistent improvement in coding efficiency over simulcast video coding.
Other limitations of typical multi-view video coding techniques prevent them from achieving a high coding efficiency in a practical application. For example, in the conventional multiple reference multi-view video coding schemes, the saving in bit rate comes from the auxiliary views, where every auxiliary view is always predicted from the main view. When the number of views increases, the performance gain compared to the simulcast coding also increases only proportionally, because only the correlation between every auxiliary view and the main view is exploited.