Multi-view video or free viewpoint video refers to applications that enable users to watch a static or dynamic scene from different viewing perspectives. Generally, to provide a smooth multiple-perspective viewing experience, content producers capture a distinct scene with ideal quality from multiple camera positions arranged at different angles. For example, a convergent multi-view camera setup may have cameras generally positioned equidistant from a point in a scene, with the cameras aimed inward to capture the same scene from different angles. Such a setup often is widely used in movies, advertising, educational video, sports events, and general event broadcasting.
In addition to the general application, the simultaneous multiple video streams that are output from multi-view cameras are also often referred to as multi-view video. A multi-view video sequence can be naturally regarded as a temporal sequence of special visual effect snapshots, captured from different viewpoints at multiple times. Such a special snapshot is comprised of the still images taken by multiple cameras at one certain time instance, which is essentially a multi-view image sequence.
While multi-view image/video is capable of providing an exciting viewing experience, it is achieved at the expense of large storage and transmission bandwidth. As a result, a highly efficient compression scheme is needed. In many multi-view compression schemes, inter-viewpoint prediction is used to exploit the inter-viewpoint correlation (for example, predicting frame fi(j) from frame fi+1(j)). However, the inter-viewpoint prediction also significantly increases computational cost. This is generally because inter-viewpoint redundancy has to be exploited by conducting inter-viewpoint motion estimation across different views, and motion estimation is usually the most time-consuming component in a conventional video encoder, particularly when variable block-size motion estimation is performed.
Although numerous fast motion estimation algorithms have been considered for alleviating the heavy computational load of motion estimation while maintaining its prediction performances, these fast motion estimation algorithms essentially proposed to accelerate temporal prediction, and thus may inefficiently render the direct application to inter-viewpoint prediction. This is because differences in the various application scenarios dictate significantly different motion estimation design principles and the associated motion prediction performance. In fact, to track the large and irregular (depth-dependent) motion typical for convergent multi-view camera setups, traditional full-search motion estimation and most fast-motion estimation algorithms have to greatly amplify the motion refinement grid to prevent the search points from dropping into a local minimum in the earlier search stages. Otherwise, the resulting coding efficiency will significantly drop.