In 3D-TV, 3D-video and 3D-cinema, information of two or even more images is joined together for production of a spatial reproduction of image content. Typically, two stereoscopic images are used for computation of depth information, wherein a matching process is applied to find point correspondences in the two input or basic images. The displacement between two corresponding points in the basic images resulting from the different positions of the cameras when capturing the real world scene is commonly referred to as disparity. A 3D-structure, i.e. the depth information of the captured scene, may be reconstructed from these disparities by triangulation if the camera parameters are known. Depth information for the pixels in the basic images is usually integrated into a disparity map containing the result of the respective matching calculations.
The performance of the stereo matching process inherently depends on the underlying image content. Even for ideal conditions there still remain several problems, e.g. occluded areas in one of the input pictures, perspective deformations due to lens distortions, specular reflections or missing texture in some object etc., that make the matching process a challenging task. For some parts of an image it is inherently more difficult to determine accurate values for the disparity, also referred to as disparity estimates, than for others. This leads to varying levels of accuracy and reliability for the disparity estimates.
Window based similarity measures like sum of absolute differences (SAD), sum of squared differences (SSD), or normalized cross-correlation (NCC) are widely used in support aggregation steps of disparity estimators.
In the article T. Kanade et al.: “A stereo matching algorithm with an adaptive window: Theory and experiment”, IEEE Trans. Pattern Anal. Mach. Intell. Vol. 16 (1994), pp. 920-932, the advantages and disadvantages of using aggregation over support windows are summarized as follows: “A central problem in (local) stereo matching lies in selecting an appropriate window size. The window size must be large enough to include enough intensity variation for reliable matching, but small enough to avoid the effects of projective distortion. If the window is too small and does not cover enough intensity variation, it gives a poor disparity estimate, because the signal (intensity variation) to noise ratio is low. If the window is too large and covers a region in which the depth of scene points (i.e. disparity) varies, then the position of maximum similarity may not represent correct matching due to different projective distortion (sic) in the left and right images. The fattening effect occurs when the selected window contains pixels at different depth.”
A number of methods have been proposed based on the implicit assumption that pixels sharing similar colors belong to the same object and also share similar disparities, i.e. are fronto-parallel. These assumptions are not always given, but can often be assumed as long as the support window size does not become too large and the world consists of relatively large and smooth objects. Amongst the proposed methods are adaptive window methods. Here the shape of the support window is adapted to the object borders depicted in the images or video sequences. Obviously this requires additional processing steps to determine or select an appropriate support window size and shape.
Multiple window methods have also been proposed. Here a set of window sizes and shapes is provided where the algorithm selects one of them depending on some quality metrics of the result. Obviously this also causes additional processing steps, as multiple windows need to be aggregated.
In K.-J. Yoon et al.: “Adaptive Support-Weight Approach for Correspondence Search”, IEEE Trans. Pattern Anal. Mach. Intell. Vol. 28 (2006), pp. 650-656, adaptive support weights in combination with fixed support window shapes and sizes have been proposed. Two independent and combined weights are used, one of them being a factor describing spatial proximity to the center pixel of the support window and the other one being a factor describing color similarity to the center pixel:
      w    ⁡          (              p        ,        q            )        =                              w          g                ⁡                  (                      p            ,            q                    )                    ·                        w          c                ⁡                  (                      p            ,            q                    )                      =                  exp        ⁡                  (                      -                          (                                                                    Δ                    ⁢                                                                                  ⁢                                          g                      pq                                                                            γ                    g                                                  +                                                      Δ                    ⁢                                                                                  ⁢                                          c                      pq                                                                            γ                    c                                                              )                                )                    .      
A truncated SAD is used for the matching quality measure:
      tSAD    ⁡          [              x        ;        d            ]        =                              ∑                      x            i                                                          ⁢                                  ⁢                                            w                              g                ⁢                                                                  ⁢                12                                      ⁡                          (                                                x                  i                                ,                d                            )                                ·                                    w                              c                ⁢                                                                  ⁢                12                                      ⁡                          (                                                x                  i                                ,                d                            )                                ·                      Min            ⁡                          (                                                                                                                                    Y                        1                                            ⁡                                              (                                                  x                          i                                                )                                                              -                                                                  Y                        2                                            ⁡                                              (                                                                              x                            i                                                    +                          d                                                )                                                                                                              ;                T                            )                                                            ∑                      x            i                                                          ⁢                                  ⁢                                                            w                                  g                  ⁢                                                                          ⁢                  12                                            ⁡                              (                                                      x                    i                                    ,                  d                                )                                      ·                          w                              c                ⁢                                                                  ⁢                12                                              ⁢                      (                                          x                i                            ,              d                        )                                .  
Adaptive support weights result in good disparity estimation results. However, calculating spatial proximity and color similarity weight factors requires considerable processing. In addition, the use of SAD causes issues for real-world footage, as the SAD is not very robust against luminance and color differences between the camera views.