In driver assistance systems with a stereo camera, the stereo images can be used for calculating a depth image. A depth image is very helpful for many functions of a driver assistance system, including the collision avoidance, the following of other vehicles, etc.
For determining a depth map from stereo images, different correlation methods can be used for correlating image points or image objects in the left image and the right image of a stereo image pair. These correlation methods or algorithms differ in quality and density of the calculated depth map. In addition, the required computing power and the required amount of memory vary for the computations. Basically, the following classes of correlation methods exist:
1. Local correlation methods
2. Global or semi-global methods with pre-determined disparities/labels (discrete optimization methods)
3. Global methods with continuous disparities (continuous optimization methods, e.g. convex optimization).
The advantages and disadvantages of the various groups and methods are not further discussed here.
In stereo image processing, the “disparity” refers to the distance or shift, i.e. the difference in image point location, of an object as it appears in the left image and in the right image of a stereo image pair.
In an adequately calibrated stereo camera (which is assumed in the following), only the horizontal distances, i.e. the distances along a respective horizontal image row in the camera, need to be considered for determining the disparities.
For practical applications the algorithms from the second group above have proved to be particularly suitable. Especially SGM (Semi Global Matching) is regarded as the most practical or functional algorithm for use in real-time systems.
It provides both a high quality of the depth map as well as, compared to most other algorithms, a low demand of computing power and memory. On an FPGA (Field Programmable Gate Array) of the latest available driver assistance camera, it runs in real time with approximately 16 FPS (frames per second, i.e. image pairs per second). A calculation in real time on a signal processor is not feasible in the foreseeable future.
In fact, for the use of SGM there is currently no alternative which would not involve significant disadvantages. SGM is state of the art and is widely in use.
In the algorithms of the second category and in particular SGM, the disparities are determined as integer shifts of the pixels in the image. For this, in a first step, a comparison operator is used per pixel and disparity. In practice and according to the state of the art, the census operator has proved to be a particularly robust comparison operator.
For this example, the right image is used as the reference image and (x,y) is a pixel coordinate in the image. Then the census operator result or signature is determined for each pixel P_r(x,y) in the right image. In the left image, the census operator result or signature is determined for the pixel P_l(x+d,y) with d=0, . . . , d_max, and is compared with the census result or signature from the right image. This therefore results in a cost measure C (x,y,d) per pixel and disparity. For the entire image, this results in a three dimensional space, which is called the cost volume. Based on this cost volume SGM performs an optimization, which determines a disparity for each pixel as a result. In addition to the integer disparity values, SGM determines, by means of an interpolation of the internal costs, which are available for integer and uniformly distributed disparities, a sub-pixel precise disparity.
The disparity does not directly indicate the distance z of the object to the camera. The connection or relationship is reciprocal, and in the present example the z-distance can be computed from the disparity d by:z=C1*1/(d+C2)  (1)wherein C1 and C2 are constants. In a calibrated stereo camera C2=0 applies.
C1=f*b in a calibrated camera depends on the following parameters:                f focal length in pixels        b base width        
The accuracy of a depth measurement is, therefore, dependent on the depth. In the near or close range a higher accuracy is achieved than in the far range. Given a maximum disparity d_max, also the minimum determinable distance z_min depends on C1.
The value z_min is predefined from or by the requirements of a camera system. A minimal determinable distance must be able to be achieved.
In practice, the accuracy in the far range has turned out to be particularly critical. The accuracy in the near or close range is more than sufficient for use in driver assistance systems.
According to the state of art there are several techniques to increase the accuracy. They are described with their advantages and disadvantages in the following:
1. Interpolation of the Costs
For each pixel, the disparity defined by SGM is selected. This disparity is refined while considering the cost of the adjacent disparities. This can be done by a quadratic interpolation with minima-search of the three disparities. Other interpolation schemes (equi-angular fit) are also possible. Details are described in Heiko Hirschmüller, Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20-26 Jun. 2005, San Diego, Calif., United States, Volume 2, pp. 807-814, and improvements can be found e.g. in m Stefan K. Gehrig, Uwe Franke, Improving Stereo Sub-Pixel Accuracy for Long-Range Stereo ICCV of 2007.
The advantage of this method is the simple and resource-efficient implementation. However, the disadvantage is that this method often cannot significantly improve the results. One of the main reasons is the effect of the “pixel-locking”, an artifact formation in the sub-pixel interpolation of objects, which are represented by a relatively small number of pixels in the image. Due to the pixel-locking, certain interpolated positions (such as e.g. centers or edge points of the pixels) are over-represented.
2. Finer Sampling of the Disparities
In Stefan K. Gehrig, Uwe Franke, Improving Stereo Sub-Pixel Accuracy for Long-Range Stereo ICCV 2007 it is also outlined that, by a finer sampling of the disparities, the accuracy of the depth map can be significantly improved.
In that publication, the resolution of the cost volume is increased in the dimension d (of the disparity) of the cost volume, by inserting intermediate steps with 0.5 or 0.25 pixel disparities. The costs of the intermediate steps are interpolated from the adjacent costs in the example. As a result the cost volume thus contains 2 or 4 times more disparities.
The disadvantage of the finer sampling is that the need for resources, i.e. computing power, amount of memory and memory bandwidth, increases linearly with the number of the disparities.
3. Sub-Pixel Refinement
Starting from an original disparity map the disparities can be refined locally. For this, local correlation methods are used on the two images.
These methods, however, work only in image ranges with a high contrast, i.e. at edges, etc. In practice, it is, therefore, unrealistic to densely refine a disparity map with such methods.
4. Hierarchical Refinement Scheme
In Stefan K. Gehrig, Clemens Rabe, Real-time Semi-Global Matching on the CPU, CVPR 2010 a method is described, in which the disparities in the near or close range are determined with a lower resolution than in the far range. However, this does not apply only for the disparities, but also for the xy-resolution of the pixels. Smaller objects in the near or close range can possibly not be recognized in this way.
In DE 103 10 849 A1 a method for photogrammetric distance and/or position determination is shown, which implements a hierarchical measurement range adjustment. Here, from an original reference and search-gray-scale image pair, p new pairs with an increasingly reduced resolution are produced.
In all resolution steps, now similarity measures are determined for reference image blocks with equally sized search image blocks, wherein the search image blocks are respectively shifted in the respective search gray-scale image pair in the line or row direction with a step size of one pixel. The disparity for a reference block is determined by searching sequences of similarity measures for this reference block with regard to extreme values, wherein for all resolution steps except the original resolution step an area at the beginning of the sequence of similarity measures, which was already detected in the preceding resolution step, is respectively excluded from the search. From the location of the identified extreme value, the position of the corresponding object point is determined in a conventional manner.
The disadvantage with this local method is the high expenditure when generating the p image pairs with reduced resolution and the high iteration demands for the disparity determination.
5. Calculation of Overview and Magnifier Map
In DE 10 2008 015 535 A1 it is described that an overview map and a magnifier map can be calculated separately. In this regard, the overview map works at the half resolution over the entire image range, and the magnifier map works in the full resolution, however, only in a variable section of the image.
The disadvantages of the above known method are that the magnifier map is not present for the entire image and that the amount of required resources is doubled by calculating the magnifier map, or that in two separate steps first the entire image is calculated with a reduced resolution and then the magnifier map is calculated with an increased resolution.