Advances in camera, display and networking technology have enabled a new set of applications for three dimensional (3D) scene communications. Such applications include 3D TV/free viewpoint TV (FTV), tele-immersive environments, immersive teleconferencing, etc. These applications typically employ multiple video cameras to simultaneously acquire a visual scene from different viewpoints. The video from these cameras, called multi-view video, is then transmitted to a remote end for rendering, providing the user with an immersive experience.
Due to the high raw data rate of multi-view video, compression of multi-view video may help enable applications such as 3D communication. Researchers have studied extensively on predictive coding for multi-view video compression, taking advantage of redundancy across the videos from different viewpoints. Such compression is typically based on the multi-view video data and involves inter-frame analysis (comparing frames of different cameras) and temporal analysis. Constraints for compression, such as delay constraints and random accessibility have been devised.
Because 3D TV is one of the most dominant driving forces of multi-view video, most multi-view video compression schemes assume a two-stage process—an offline stage for compression and an online stage for streaming. The videos are first compressed with advanced predictive coding schemes and are then stored. When transmitting to the remote end, all streams are sent across the network for decoding and rendering. In the 3D TV scenario, the video data may be transmitted through multicast channels, which can be efficient if there are thousands of remote viewers.
With 3D TV and other applications, it may be desirable to generate images from arbitrary points for both viewpoint selection and parallax simulation. Viewpoint selection, sometime called free-viewpoint video, involves allowing a user to select a desired point of view and then generating video from that viewpoint using video data from multiple cameras. Unless a broadcast medium is used, a server at the capturing/transmitting end may generate the desired view, encode it, and send the view to the user. However, to date, effective compression of multi-view video (where multiple camera feeds are transmitted) for parallax simulation and other viewpoint-dependent effects has not been accomplished.
Techniques related to compression of multi-view video are described below.