Field
The invention relates to an image processing method for processing preferably stereoscopic images and an optical (visual) sensor system, especially a camera system using this method. Furthermore, the invention relates to a vehicle, especially a ground, air or sea vehicle or a robotic device, comprising the sensor system adapted to determine/calculate the distances from the sensor system to a physical object, and/or may also be used in determining/calculating optical flow from images/an image stream provided by the optical sensor system.
Description of the Related Art
The invention especially relates to the field of stereoscopic vision, which is used in many autonomous or semi-autonomous systems including Advanced Driver Assistance Systems (ADAS), such as in-vehicle navigation systems, adaptive cruise control (ACC), lane departure warning systems, lane change assistance, collision avoidance systems (or pre-crash systems), intelligent speed adaptation or intelligent speed advice (ISA), night vision, adaptive light control, pedestrian protection systems, automatic parking, traffic sign recognition, blind spot detection, driver drowsiness detection, vehicular communication systems, and/or hill descent control, etc.
Stereoscopic vision allows for the estimation of distances by using two or more sensors and images derived therefrom. Image parts or patches of one camera are correlated with image parts or patches of images of one or more other cameras. The difference in position of the physical object in the correlating image parts directly relates to the distance of the object from the camera. Generally, close objects have a large difference in position in the compared image parts while far away objects have a small difference in position. An advantage over other distance measurement means is that energy efficient sensors such as cameras can be used. Using stereoscopic vision is also beneficial as stereoscopic vision sensor systems allow to scale as stereo cameras can be used for any distance by altering the baseline (i.e. distance between the cameras).
The sensor system according to the invention hence comprises at least two optical sensors, such as cameras (CCD, CMOS, . . . ), laser scanners, infrared sensors, etc. The visual sensor produces images and sends these images to a processing unit, e.g. as a stream of images.
The processing unit processes the images and derives image information from the images provided by the two sensors. The processing unit may be part of the sensor system, but may also be separate from the sensor system. For example, an image stream can be supplied from a camera-based stream recording system to the processing unit for processing.
Known image parts or patch-matching stereo methods suffer from bad correlations when the fronto-parallel assumption is violated or when the texture information is low. Two frame stereoscopic correspondence methods usually work with a rectified image pair, and typically exploit the fronto-parallel assumption or frontal parallel plane assumption either explicitly or implicitly.
In particular, this assumption assumes that position disparity (or depth) is constant (with respect to the rectified stereo image pair or image part/patch pair) over a region under consideration. However, physical objects may possess surfaces rich in shape, which generically violates the frontal parallel plane assumption. This is explained with reference to FIG. 1: For a regular surface S ⊂ 3, the tangent plane Tp(S) (in solid lines) at a point p ε S is well defined. Traditional stereoscopic correspondence methods use the frontal parallel plane (in dotted lines) to represent the (local) surface geometry at p, which, however, is incorrect. In FIG. 1, the sensors Cl and Cr are shown, which refer to a left (l) and right (r) camera.
This invention improves block-matching stereo matching by combining the matching value of differently shaped and sized matching filters in a multiplicative manner, where a block-matching method is a way of locating matching blocks in a sequence of digital video image frames, e.g. for the purposes of motion estimation. The purpose of a block-matching method is to find a matching block from a frame i in some other frame j, which may appear before or after i. Block-matching methods make use of an evaluation metric to determine whether a given block in frame j matches the search block in frame i. In the following, the term frame is used analogous with image patch, part, (sub-)window, or portion, where a block is also referred to as a filter of essentially rectangular shape.
Known approaches are described e.g. in EP2 386 998 A1, which describes a robust matching measure: the summed normalize cross-correlation (SNCC), which can be used for patch-matching correlation searches. One application of this is for example the stereoscopic depth computation from stereo images.
The paper “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms” by Scharstein and Szeliski (2002, International Journal of Computer Vision, 47(1-3):7-42) in an overview shows the most common stereo computation methods used in the art.
In “Non-parametric Local Transforms for Computing Visual Correspondence” (1994, Proceedings of the third European conference on Computer Vision, Vol. II) Zabih and Woodfill introduce the rank and census transform for images in order to improve patch correlation. It is proposed to match rank transformed images with summed absolute or squared difference and census transformed images with the hamming distance.
Finally, in “Real-Time Correlation-Based Stereo Vision with Reduced Border Errors” (2002, International Journal of Computer Vision) Hirschmüller, Innocent and Garibaldi describe a multi-window block-matching stereo approach where a larger correlation window is partitioned into equal-shaped sub-windows. For each disparity the correlation values of the sub-window are sorted and only the n best sub-windows are used for calculating the overall window correlation value in order to reduce border effects. The correlation is computed by summed absolute difference within each sub-window and the overall window cost is computed by summing up the correlation values of the n best sub-windows.