Stereo processing is one of the most intensively researched areas in computer vision. Over the last three decades a large amount of different approaches have been developed. Current state-of-the-art approaches are based on belief propagation [1, 2], dynamic programming [3, 4] or graph-cut [5].
However, the traditional correlation based stereo image processing is still a common tool, especially in real-time systems [6-8]. A major drawback of the traditional stereo processing approach is that depth discontinuities are blurred more than in state-of-the-art approaches. The degree of blurring depends on the matching costs used for correlating image patches. The most common matching costs for traditional stereo processing are the sum of absolute difference (SAD) and the sum of squared difference (SSD). These measures assume a constant intensity for corresponding pixels, i.e. the color or gray values are the same for corresponding pixels in the left and right image. For this reason they often fail in real-world applications due to lighting changes between the two camera views. A common way for reducing this effect is to apply a Laplacian of Gaussian filter or to subtract the mean intensity in each image prior to the actual stereo computation. Furthermore, SAD and SSD can produce bad correlation values for corresponding pixels. To compensate for this, Birchfield and Tomasi have proposed a sampling-insensitive calculation [9]. However, comparisons [10] have shown that despite these countermeasures, SAD and SSD are inferior to other matching costs that account directly for changes in intensity.
One of the standard matching costs that accounts for changes in intensity is the normalized cross-correlation (NCC). It allows for a bias and a linear gain of pixel intensities. Furthermore, NCC is optimal for compensating Gaussian noise and the correlation values are constrained to the interval of [−1,1], which eases the selection of a threshold for rejecting bad matches. The main disadvantage of NCC is the strong blurring at depth discontinuities compared to other matching costs.
Two other important cost functions are “rank” and “census transform” [11]. The main idea of “rank transform” is to replace each pixel intensity with its rank among a certain neighborhood. This removes most of the lighting changes that can occur between images and decreases the blurring of depth edges compared to the other cost functions. The actual rank transform is only a preprocessing of the stereo images, which is usually followed by a stereo computation with SAD or SSD. In a comparison of six cost functions in [10], rank transform was shown to be the best cost function for correlation based stereo with respect to several radiometric changes.
The “census transform” is an extension of the rank transform which does not replace the pixels with their rank but rather a binary fingerprint that encodes which pixels of the neighborhood are smaller than an anchor pixel. The matching cost here is the hamming distance between two such finger prints.
In the following it is discussed why the normalized cross-correlation (NCC) is prone to blur depth at discontinuities stronger than other matching costs and why the summed normalized cross-correlation abates this problem. Furthermore, it is shown that NCC and SNCC can be implemented efficiently using box filters.
Problem of Normalized Cross-Correlation (NCC)
For two patches from two camera images IL (left) and IR (right) the normalized cross-correlation (NCC) is defined as:
                                          ρ            x                    =                                                    1                                                                        p                    ⁡                                          (                      x                      )                                                                                                    ⁢                                                ∑                                                            x                      ′                                        ∈                                          p                      ⁡                                              (                        x                        )                                                                                                                                                    ⁢                                                      (                                                                  I                                                  x                          ′                                                L                                            -                                              μ                        x                        L                                                              )                                    ⁢                                      (                                                                  I                                                                              x                            ′                                                    +                          d                                                R                                            -                                              μ                                                  x                          +                          d                                                R                                                              )                                                                                                      σ                x                L                            ⁢                              σ                                  x                  +                  d                                R                                                    ,                            (        1        )                        where                                                                            μ            x                    =                                    1                                                                p                  ⁡                                      (                    x                    )                                                                                        ⁢                                          ∑                                                      x                    ′                                    ∈                                      p                    ⁡                                          (                      x                      )                                                                                                                                    ⁢                              I                                  x                  ′                                                                    ,                              σ            x                    =                                                                      1                                                                                p                      ⁡                                              (                        x                        )                                                                                                                ⁢                                                      ∑                                                                  x                        ′                                            ∈                                              p                        ⁡                                                  (                          x                          )                                                                                                                                                                    ⁢                                                            (                                                                        I                                                      x                            ′                                                                          -                                                  μ                          x                                                                    )                                        2                                                                        .                                              (        2        )            
In the above equations x is the pixel position of the anchor point of the left patch, p(x) is the set of pixel coordinates of the left image patch and p(x+d) is the set of pixel coordinates of the right image patch, i.e. d denotes the disparity between the left and right image patch. Furthermore, |p(x)| is the number of elements of p(x).
As was stated above NCC tends to blur depth at discontinuities. The reason for this is that depth discontinuities often exhibit a strong contrast and the correlation value is influenced most by the strongest contrast. This effect is due to the normalization. In each patch p(x) the values are normalized by
                                          I                          x              ′                        norm                    =                                                    I                                  x                  ′                                            -                              μ                x                                                    σ              x                                      ,                              where            ⁢                                                  ⁢                          x              ′                                ∈                                    p              ⁡                              (                x                )                                      .                                              (        3        )            
Because of this normalization the low contrast structure in the vicinity of a high contrast edge is suppressed. To visualize this, a very high contrast rectangle (value 10000) is added to the left Venus image of the Middlebury stereo benchmark [12]. Then the normalization equation (3) is applied to this image using different filter sizes or patch sizes. The resulting images of the filtering are shown in FIG. 1. They demonstrate that the high contrast rectangle suppresses the structure in its surrounding, whose size is defined by the patch size.
Due to this suppression effect all patches in the vicinity of a high contrast edge favor the disparity of this edge because it is the dominant structure. Not fitting this structure would lead to a large error or small correlation value. In FIG. 2a a cutout of the left image of the Venus scene is shown. The white rectangle patch is correlated with the right image, shown in FIG. 2b, for several disparities (shifts). FIG. 2c shows the correlation values for these disparities. This plot shows that the best match is roughly at 13 pixel disparities while the ground truth depth is roughly at 8 pixel disparities (depicted by the horizontal line). The patch that corresponds to the peak is depicted as the solid rectangle in FIG. 2b. The reason for this wrong match is the large contrast edge between the bright newspaper and the dark background. As the newspaper has roughly a disparity of 13 pixels all patches that encompass the border of the newspaper will have the best correlation at 13 pixels disparity.
In summary the above observations demonstrate that the normalized cross-correlation is biased by strong contrasts. This leads to the conclusion that NCC for stereo processing should be used with small patch sizes. However, decreasing the patch size would lead to noisy depth images.
The blurring effect of NCC hence arises from its sensitivity to high contrasts. The invention therefore presents a new two-stage correlation that reduces this sensitivity. In the first stage a normalized cross-correlation is computed using a small patch size followed by a summation of the correlation coefficient in the second stage. It is shown that this summed normalized cross-correlation (SNCC) dramatically improves the results of traditional stereo algorithms compared to plain NCC and also the powerful rank transform.