One method of performing depth estimation with a stereo pair of images is to find correspondences between them by comparing small image patches from one image to patches from the other image. In order to measure how well a pixel p in one image matches a pixel q in the other image, a patch centered on p is compared to a patch centered on q, using a matching score such as normalized cross-correlation (NCC) or the sum of squared differences (SSD).
For example, the (negative) SSD between a patch at pixel p in the left image IL and a patch at pixel q in the right image IR is computed as
      -                  ∑                  i          =                      -            r                          r            ⁢                        ∑                      j            =                          -              r                                r                ⁢                              (                                                            I                  L                                ⁡                                  (                                                                                    p                        x                                            +                      i                                        ,                                                                  p                        y                                            +                      j                                                        )                                            -                                                I                  R                                ⁡                                  (                                                                                    q                        x                                            +                      i                                        ,                                                                  q                        y                                            +                      j                                                        )                                                      )                    2                      ,where r is the radius of the patch; (the negation is used so that similar patches receive a high score, while dissimilar patches receive a low score).
In order to ensure that this matching score is high for the correct match, and low for all other possible matches, an active illumination pattern may be applied to the scene (e.g. a pattern of pseudorandom laser dots). This ensures that the patches contain some distinctive texture. In order to make the active illumination invisible to humans, the active illumination and stereo cameras may operate in the infrared (IR) region of the spectrum, instead of the visible part.
One problem with patch-based stereo is that pixels near depth discontinuities (e.g. at object boundaries) may receive incorrect depth estimates, due to the fact that a patch may include pixels from two different depths (sometimes referred to as “stereo fattening”). For a pixel p whose true depth is z1, but which lies near an object whose depth is z2, the patch may include pixels from both z1 and z2. If the z2 pixels in the patch have stronger texture than the z1 pixels, the matching score may be higher for z2 than for z1, even though the true depth is z1. This leads to pixel p receiving an incorrect depth estimate of z2.
In order to mitigate this effect in other patch matching scenarios, one popular method is to assign each pixel in the patch a weight, based on whether that pixel is believed to lie at the same depth as the pixel of interest p. Parts of the patch which have the same depth as p should receive a high weight, while parts which have different depths should receive a low weight. When computing the NCC or SSD, the contributions of the different parts of the patch are weighted. This method is generally referred to as “Adaptive Support Weights” (ASWs).
Because the depths of the pixels in the patch are unknown, the weights are computed by looking only at the input images. The assumption underlying in ASW approaches is that, in images captured with IR or RGB (visible spectrum) cameras without active patterned illumination, pixels that have similar depths within a patch generally have similar colors. Thus, one simple way to compute the weights for each pixel in the patch is to compare its color to that of the central pixel p. Pixels with similar color to the central pixel receive high weights, and pixels with different colors receive low weights. Using these weights in the SSD match score above, the computation is:
  -            ∑              i        =                  -          r                    r        ⁢                  ∑                  j          =                      -            r                          r            ⁢                                                  w              ij                        ⁡                          (                                                                    I                    L                                    ⁡                                      (                                                                                            p                          x                                                +                        i                                            ,                                                                        p                          y                                                +                        j                                                              )                                                  -                                                      I                    R                                    ⁡                                      (                                                                                            q                          x                                                +                        i                                            ,                                                                        q                          y                                                +                        j                                                              )                                                              )                                2                .            
The weights wij can be computed from the left image by comparing the patch pixels to the central pixel:
            w      ij        =          exp      ⁡              (                  -                                                                                                        I                    L                                    ⁡                                      (                                                                                            p                          x                                                +                        i                                            ,                                                                        p                          y                                                +                        j                                                              )                                                  -                                                      I                    L                                    ⁡                                      (                                                                  p                        x                                            ,                                              p                        y                                                              )                                                                                      λ                          )              ,where λ is a scalar parameter.
The problem with computing adaptive support weights on IR images with active illumination is that the patterned illumination breaks the assumption that the color of a surface will be approximately constant. The illumination pattern causes large intensity/color changes to occur everywhere, not only at object boundaries, whereby the stereo matching degrades.