The present invention relates to disparity map generation and its use in, for example, virtual view rendering.
Disparity estimation (i.e. finding correspondences in image pairs showing the same content from different perspectives) is a very active research topic in computer vision since years. With the advent of 3D cinema, 3D television and auto-stereoscopic 3D displays the meaning of robust and reliable disparity estimation increases significantly. In particular, for depth-based 3D post-production and for auto-stereoscopic multi-view 3D displays it is extremely important to generate reliable disparity maps that allow for robust rendering of virtual views from stereoscopic or even multi-view content. On the other hand the accuracy of disparity maps still depends on the suitability of the content since e.g. un-textured areas, repetitive patterns or occluded image parts usually cannot be estimated reliably.
When using disparity maps to render new virtual views, it is not necessitated to provide a completely accurate map but one, which is smooth and consistent with the image content. Recently, widely used and intensively researched bilateral, cross-bilateral and cross-trilateral filters often lead to exactly those smooth maps, which can be used for visually pleasant rendering of virtual views by using Depth Image Based Rendering (DIBR). On the other hand, to successfully use such a filter the disparity map should either contain only reliable results or should provide a reliability measure along with all estimates.
Several techniques for acquiring disparity maps usable for rendering of virtual views by exploiting stereo matching algorithms are known.
For example, the usage of more than two cameras to improve the results of disparity estimation is a common approach, which can be found in literature as well as in many patents. However, many approaches use the additional information in an implicit manner, meaning that the disparity-estimation is enhanced to directly take the additional information into account and use it to optimize the estimation results. On the other hand, using the information in an explicit manner, i.e. keeping the individual stereo disparity estimations independent and using the different results in a dedicated post-processing step is not very common.
Already in 1996 Kanade et al. [1] described a system with a reference camera and several inspection cameras. Point correspondences between the reference camera and an inspection camera are searched along corresponding epipolar lines. Finally the best match is found by optimizing over all camera pairs therefore using an implicit method. Very similar ideas can be found in a US patent application by G. Q. Chen [2]. In “Trinocular Stereo: A Real-Time Algorithm and its Evaluation” [3] Mulligan et al. also describe the usage of the trifocal constraint in an implicit manner.
In their patent “System for combining multiple disparity maps” [4] Jones and Hansard also use an explicit method to compute consistencies and reliability from multiple independent disparity estimations. However, their technique is restricted to setups, where all cameras are on a common baseline. Furthermore, the described optimization procedure focuses on finding a parameterization to transform the disparity maps in a way that a single representation can be found by e.g. averaging the different estimations.