Driver assistance systems are already a very valuable support for the driver and will be even more so in the coming years. Driver assistance systems operate with and in the vicinity of human beings, which leads to high safety requirements, when a driver assistance system is able to make decisions and autonomously generate behavior (e.g., autonomous braking after the detection of an obstacle on the lane). The vehicle domain can be subdivided into dynamic (e.g., cars, bicycles) and static objects respectively static scene elements (e.g., parking cars, road, buildings).
For all static scene elements the system has to cope with the inaccuracy of measurements (i.e., the sensor variances), for whose compensation a number of efficient, well-known approaches exist (e.g., Kalman filter [1] for making approaches more robust that rely on noisy input data, as model-based lane marking detection systems [2]).
For dynamic scene elements in addition to the handling of sensor variances the object induced motion must be taken into account. In the following, the motion of such dynamic objects will be called “object motion”, as opposed to the “vehicle ego motion” of the car that carries the ADAS (Advanced Driver Assistance System, see document [12] summarized above for comprehensive background information on this term and an exemplary ADAS for assisting the driver) and sensory devices. Said dynamic objects are highly relevant for a driver assistance system, since unexpected motion of dynamic objects can result in dangerous situations that might injure humans. Hence, approaches which robustly gather information about scene elements that are dynamic, are highly relevant for driver assistance systems.
Once the scene is subdivided into static and dynamic scene elements for all dynamic objects the object motion can be modeled in order to incorporate it into the behavior generation and planning of the driver assistance system (e.g., usage of dedicated motion models for estimating the trajectories of dynamic object and including them into the collision mitigation module).
Vision-based approaches in the surveillance domain use differential images for detecting dynamic objects. Here, an image at time t is subtracted by the one at time t−1. But due to the ego motion of the camera the differential images cannot detect dynamic objects reliably, as it is shown in FIG. 4. The vehicle ego motion causes a change in nearly all image pixel positions, making a reliable separation between vehicle ego motion and object motion impossible.
Other approaches [6] combine the optical flow with the disparity map of a stereo camera system based on Kalman filters, which provides the 3D position and 3D velocity of single points in the image. These single points are used to compute the ego motion of the camera vehicle over multiple frames. However, the motion of other objects is computed based on optical flow computation between a predicted 2D warped pixel image and the current image.
In document [13] a system for the detection of moving humans in an indoor environment is described. The system is carried by a mobile robot that fulfils a surveillance task. The system is based on a camera setup of 36 stereo cameras that allow 360 degree surveillance.
Typical systems for the detection of dynamic objects compute the optical flow (pixel-wise correlation of two consecutive images deriving the motion magnitude and direction on the image plane) between a predicted (warping of the previous image, counteracting the ego-motion of the robot) and the current captured image. The optical flow will be different from zero for image regions containing dynamic (i.e. ego-propelled) objects.
Opposed to that the system described in document [13] relies on stereo data for the computation of a depth map of the scene. Using the depth map of the previous frame and dead reckoning the ego-motion is compensated, leading to a predicted depth map. Computing the difference between the predicted and measured depth map results in differential depth map (in image coordinates) that shows unexpected peaks at regions containing dynamic objects. However, the question as to how the resulting depth map is post processed remains unanswered because each moving object will cause 2 regions of changed depth (the new position and the old position). In a comparatively static indoor scene simple heuristics might be applicable to solve the problem of finding the current object position. Still, this point stays open.
The system of document [13] relates to the invention insofar as that the important role of stereo information for the detection of dynamic objects is recognized. However, the approach works on the depth map and therefore in image coordinates, as typical optical-flow-based image-warping-approaches. As opposed to classical approaches a correspondence problem arises, since all moving objects influence the differential depth map twofold (peak on the old and the new object position, no information in the differential depth map present to derive which position is which). Furthermore, the domain of application is indoors on a mobile robot platform with the central application of surveillance of humans. With such a specific task and a rather structured environment, the detection task is eased considerably allowing the detection system to be tuned to its environment (search for objects in the height of humans, typical object size-related constraints are exploited, camera system is designed to detect close objects alone).
A somewhat related system for the detection of dynamic objects is presented in [14] being mounted on a mobile robot. The presented approach is based on a computed dense optical flow field and dense stereo disparity computed from the images of a pair of calibrated stereo cameras. Different from the system laid out in [13] the system computes an expected disparity map (the raw data for computing depth information) taking into account the ego-motion of the vehicle and compares this to the measured disparity map by computing a kind of “disparity flow”. Modulo noise at a region containing a residual disparity flow marks dynamic objects. Summarizing, the approach computes the so-called 3D ego-flow (as stated explicitly by the authors, this should not be confused with 3D coordinates in X-Y-Z sense, see Section 2 of [14]), which is the 3D field of changes in u and v-image coordinates as well as the change in disparity.
Another approach using the optical flow to estimate dynamic objects is described in [7]. Here, the current image, at time t, is pixel-wisely back projected to the image at time t−1, taken the known ego movement into account and assuming that the overall scene is static. Afterwards the optical flow is used to detect dynamic objects by using the image t−1 and the image at time t back projected to t−1. The resource demands for the method are even higher as with the approach described before, because the transformation of each image pixel has to be done beforehand. Also the optical flow detection of dynamic objects is suited for lateral movement only, which leads to poor results in case dynamic objects move longitudinally in the depth direction.
A method which uses only disparity as information is described in [8]. The algorithm integrates previous disparity frames with the current one based on a pixel wise Kalman filtering method. Additionally, the change of disparity (i.e. position change in depth direction) is added in the process model of the Kalman filter. However, no lateral and vertical movements can be modeled. The approach is targeted at improving the depth information, while trying to solve the problem that previous approaches generate incorrect depth estimates for moving objects. Summarizing, the approach aims at gathering a dense depth map, with reduced errors by applying temporal integration. As a byproduct, dynamic objects can be detected, but only in case no lateral object motion takes places on the image plane.
All these methods rely on the optical flow for object motion detection, hence searching for pixel wise changes on the image plane. It is important to note, that the optical flow is resource demanding as well as error prone, especially at the borders of the image. However, the central problem and flaw of the warping approach with optical flow is that only object motion lateral to the movement of the ego camera vehicle can be detected (e.g., a bicycle crossing the road in front). However, motion that is oriented longitudinal to the vehicle course can not be detected, since there is no measurable lateral motion on the image plane and hence no optical flow present (e.g., a vehicle driving on the road in front brakes hard and gets nearer).