In 3D-TV, 3D-video and 3D-cinema, the insertion of graphics elements needs to follow some rules concerning their depth positioning in order to avoid visual discomfort. Most importantly, a superimposed element should not be stereoscopically positioned behind an object in the video, as this would violate real world physical constraints. However, graphics elements may also not keep a too large safety margin in front of the closest object as a too strong “pop-out effect” may also lead to visual fatigue, caused by the accommodation-vergence conflict. Especially subtitles should be placed just in front of the closest object, as reading them is equivalent to a frequent refocusing between the video and the text. A human observer needs significantly more time to switch his or her focus of attention if the associated jump in depth is larger.
As a consequence, the stereoscopic positioning of text and graphics for 3D menus or 3D subtitles requires few, but highly reliable and accurate depth estimates to avoid these elements to be placed too far in front of the screen or, even worse, behind a video object. To compute depth information from a set of two (or more) images, stereo matching is applied to find point correspondences between the input images. The displacement between two corresponding points is referred to as disparity. The 3D structure of a scene can be reconstructed from these disparities through triangulation if the camera parameters are known.
Using calibration and rectification, it can be approximated reasonably well as if the images were captured with perfectly aligned, ideal pinhole cameras, which do not show any lens distortions. Although this allows the search to be restricted to horizontal lines, stereo matching still remains an ill-defined estimation problem for several reasons, like occlusions, perspective deformations, specular reflections, depth discontinuities, as well as missing or quasi-periodic texture.
For the above reasons the performance of the stereo matching process inherently depends on the underlying image content. For some parts of an image it is inherently more difficult to determine accurate values for the disparity. This leads to varying levels of accuracy and reliability for the disparity estimates.
For this reason, in addition to the actual disparity value itself the reliability of a disparity estimate represents valuable information. A confidence map reflecting the estimated reliability is preferably provided along with the disparity map, wherein a confidence value is determined for every disparity value.