Driver assistance systems are known to operate with and in the vicinity of human beings, which leads to high safety requirements, when a driver assistance system is able to make decisions and autonomously generate behavior (e.g., autonomous braking after the detection of an obstacle on the lane). The vehicle domain can be subdivided into dynamic (e.g., cars, bicycles) and static objects respectively static scene elements (e.g., parking cars, road, buildings).
For all static scene elements the system has to cope with the inaccuracy of measurements (i.e., the sensor variances), for whose compensation a number of efficient, well-known approaches exist (e.g., Kalman filter [1] for making approaches more robust that rely on noisy input data, as, e.g., model-based lane marking detection systems [2]). For dynamic scene elements in addition to the handling of sensor variances the object-induced motion must be taken into account. In the following, the motion of such dynamic objects will be called “object motion”, as opposed to the “vehicle ego-motion” of the car that carries the ADAS (see glossary) and sensory devices. Said dynamic objects are highly relevant for a driver assistance system, since unexpected motion of dynamic objects can result in dangerous situations that might injure humans. Hence, approaches which robustly gather information about dynamic scene elements are highly relevant for driver assistance systems.
Once the scene is subdivided into static and dynamic scene elements for all dynamic objects the object motion can be modeled in order to incorporate it into the behavior generation and planning of the driver assistance system (e.g., usage of dedicated motion models for estimating the trajectories of dynamic object and including them into a collision mitigation module). In the following, the existing approaches for detecting object motion will be grouped into 3 classes:
1. Simple basic approaches,
2. Approaches based on optical flow and
3. The 3D Warping approach.
Simple Basic Approaches
Vision-based approaches in the surveillance domain typically use differential images for detecting dynamic objects. Here, an image at time t is subtracted by the one at time t−1. But in case of strong ego motion of the camera (as typically present in the car domain) differential images cannot reliably detect dynamic objects, as it is shown in FIG. 4. The vehicle ego-motion causes a change in nearly all image pixel positions, making a reliable separation between vehicle ego-motion and object motion impossible.
A method, which uses disparity as exclusive information source is described in [8]. The algorithm integrates two consecutive disparity frames based on a pixel-wise Kalman filtering method. Additionally, the change of disparity (i.e. position change in depth direction) is added in the process model of the Kalman filter. However, no lateral and vertical movements can be modeled. The approach is targeted at improving the depth information, trying to solve the problem that disparity-based approaches generate incorrect depth estimates for moving objects. Summarizing, the approach aims at gathering a dense depth map, with reduced errors by applying temporal integration. As a byproduct, dynamic objects can be detected, but only in case no lateral object motion takes places on the image plane.
Optical Flow (Proper Object Motion Detection)
Other approaches (see e.g. [6]) combine the optical flow (pixel-wise correlation of two consecutive images deriving the motion magnitude and direction on the image plane) with the disparity map of a stereo camera system based on Kalman filters, which provides the 3D position and 3D velocity of discrete points in the image. These discrete points are used to compute the ego-motion of the camera vehicle over multiple frames. However, the motion of other objects is calculated based on optical flow computation between a predicted 2D warped pixel image and the current image.
In contribution [13] by Shimizu a system for detection of moving humans in an indoor environment is described. The system is carried by a mobile robot that fulfils a surveillance task. The system is based on a camera setup of 36 stereo cameras that allow 360 degree surveillance.
Typical systems for the detection of dynamic objects compute the optical flow between a predicted (Warping of the previous image, counteracting the ego-motion of the robot) and the current captured image. The optical flow will be different from zero for image regions containing dynamic objects.
Opposed to that, the system described in [13] relies on stereo data for the computation of a depth map of the scene (the depth map is organized in image coordinates, see Z-map on bottom right corner of FIG. 1 of [13]). Using the depth map of the previous frame and dead reckoning the ego-motion is compensated, leading to a predicted depth map. Computing the difference between the predicted and measured depth map results in differential depth map (in image coordinates) that shows unexpected peaks at regions containing dynamic objects. Unanswered remains, how the resulting depth map is post processed, because each moving object will cause 2 regions of changed depth (the new position and the old position).
Document [13] relates to the invention insofar as a role of stereo information for the detection of dynamic objects is recognized. However, the approach works on the depth map (see Z-map in bottom right corner of FIG. 1 of [13]) and therefore in image coordinates, as known from typical optical-flow-based image-Warping-approaches. A correspondence problem arises since all moving objects influence the differential depth map twofold (peak on the old and the new object position, no information in the differential depth map present to derive which position is which). Furthermore, the domain of application is indoors on a mobile robot platform with the central application of surveillance of humans. With such a specific task and a rather structured environment, the detection task is eased considerably allowing the detection system to be tuned to its environment (search for objects in the height of humans, typical object size-related constraints are exploited, camera system is designed to exclusively detect close objects).
A related system for the detection of dynamic objects is presented in [14], which again describes a system mounted on a mobile robot. The approach is based on a computed dense optical flow field and dense stereo disparity computed from the images of a pair of calibrated stereo cameras. Different from the system described in [13] an expected disparity map is computed (the raw data for computing depth information) taking into account the ego-motion of the vehicle and comparing the expected disparity map to the measured disparity map by computing a kind of “disparity flow”. Modulo noise, regions containing a residual disparity flow, mark dynamic objects. Summarizing, the approach computes the so-called 3D egoflow (as stated explicitly by the authors, this should not be confused with 3D coordinates in X-Y-Z sense, see Section 2 of [14]). More specifically, the 3D egoflow is hence the 3D field of changes in u and v-image coordinates as well as the change in disparity.
In some aspects similar to [13], in [15] an optical-flow-based system for human gesture recognition is described that runs on a static platform in an indoor environment using a monocular camera (i.e., no stereo data can be gathered). The presented real-time system computes the optical flow using a correlation-based algorithm (in this aspect similar to [7]). However, instead of running on RGB color images, the system described in [15] computes the optical flow on a specific color space (the YUV color space) thereby aiming at a high degree of illumination robustness. The gathered optical flow is coarsely clustered using a kind of region growing on the optical flow field. The resulting flow clusters are compared to a restricted number of predefined flow models representing the detectable motion-related gestures. The recognition problem is simplified by the fact that the system platform is static and located in a rather well-structured indoor environment.
Another human gesture classifier is described in [16]. Although the system itself does not rely on optical flow as information source, the contribution can still serve as document of reference. Following the argumentation of the authors, the (raw and not post-processed) optical flow is unreliable and highly influenced by noise. According to the authors, it is hence not applicable for the robust detection of moving humans in an indoor environment.
Another approach, which uses the optical flow to estimate dynamic objects, is the so-called Proper Object Motion (POM) described in [7] (see also WO patent 2009/024349A [12]). Here, the current image, at time t, is pixel-wisely back projected to the image at time t−1, taken the known ego movement into account and assuming that the overall scene is static. Afterwards the optical flow is computed between the image captured at t−1 and the back projected image t−1. The optical flow marks the position of dynamic objects present in the scene. This and comparable methods rely on the optical flow for object motion detection, hence searching for pixel-wise changes on the image plane. It is important to note, that the optical flow is resource-demanding as well as error prone, especially at the borders of the image. However, the central problem and flaw of the Warping approach with optical flow is that only object motion lateral to the movement of the ego camera vehicle can be detected (e.g., a bicycle crossing the road in front). However, motion that is oriented longitudinal to the vehicle's course cannot be detected, since there is no measurable lateral motion on the image plane and hence no optical flow is present (e.g., a vehicle driving on the road in front brakes hard and gets nearer). As shown later the drawbacks mentioned above will be resolved using the here described combination approach.
All previously mentioned optical-flow-based system approaches described in documents [7, 12-15] have in common that they are applied indoors. Following documents [7, 16] optical flow has a restricted quality that at least requires some form of post-processing or according to document [16] is more or less inappropriate.
It is important to note that the quality of the optical flow field strongly depends on the characteristics of the application area. As opposed to a typical outdoor traffic scenarios, indoor environments (especially the typical office environment such prototype systems are tested in) provide:
1. a highly stable illumination situation (artificial light, stable weather conditions),
2. separation of foreground and background typically straightforward (unicrome walls, typical angles between walls, simple 3D-related relative orientation of surfaces of 90°),
3. low complexity (number and type of present objects is restricted),
4. typically humans are the only dynamic objects present, additionally their motion parameters (velocity, moved object parts, motion direction) are highly stable, once a moved object is found it is immanently classified as human (after the object detection, no additional classification step needed),5. restricted ego-motion of the camera-carrying robot (or even static systems).
All these issues ease the detection and classification tasks considerably, since they allow the application of simple heuristics and incorporation of environment-related pre-knowledge. Still, following document [16], the optical flow is hardly applicable for the recognition of motion gestures.
Additionally, the application of optical flow is troublesome.
Following document [15], heavy environment-related assumptions were made when designing and testing the robot system (“background is relatively static”, “illumination changes slowly”, “only dominant motion” is of interest, the flow is computed “for pixels that satisfy a motion detection criterion” (differential images computed on low-pass filtered image pair), “computationally expensive algorithm” was realized, static (i.e. not ego-propelled) system. Still “the magnitude [of the flow field] can vary considerably” and only six detectable motion patterns are supported.
Following document [7], a sophisticated post-processing step c is needed to reduce the influence of outliers. The Mahalanobis norm weights the detected motion magnitude with a confidence measure coming from the correlation-based optical flow computation.
Clearly, the aforementioned 5 requirements for an application in an indoor environment are not fulfilled in an outdoor environment, e.g. a (traffic) domain. As a result, the design of robust systems and algorithms is much more challenging. Since an indoor application of optical flow is already challenging, its direct application in outdoor traffic scenarios is not feasible. Consequently, no publications exist that described resilient pure optical-flow-based applications for the traffic domain. Following the general attitude in the ADAS community, as stand-alone solution the optical flow is a too unreliable and noisy information source that is therefore somewhat neglected for the traffic domain. Different from that numerous highly robust approaches for the computation of 3D data exist that also can be used for the detection of dynamic objects.
3D Warping
Another complementary method for detecting dynamic objects is the 3D Warping approach, which is described in detail in patent application [11]. In order to provide the technical background for the current invention, the main traits are summarized in the following (please refer to FIG. 3 for an overview of the 3D Warping system). For the following description, the algorithm computes the magnitude of object motion based on sensors delivering 3D world coordinates (e.g., disparity information coming from stereo cameras [3] (see FIG. 1 for the resulting 3D world maps), Photonic Mixer Device [4], or a dense laser scanner (e.g., the high definition Lidar sensor Velodyne [5])).
The detection of dynamic objects is based on the comparison of predicted (i.e., 3D warped) and measured 3D data of the scene. More specifically, in the 3D Warping procedure the 3D world coordinates of the scene (containing static and dynamic objects) at one time step are transformed in a way that includes the motion of the ego vehicle in 3D coordinates. The 3D motion of the ego vehicle can be deduced from the longitudinal velocity and yaw rate of the vehicle, both accessible on the Controller Area Network (CAN) bus, using a single track model (see FIG. 2). In the following, the procedures for the forward 3D Warping (and backward 3D Warping, set in brackets [ ]) (see also glossary) are described. Both approaches have different advantages and drawbacks.
To be more precise, the 3D world coordinates at a time step are predicted into the future [backwards in time] under the assumption that all objects in the scene are static. The 3D world coordinates are predicted based on the measured ego vehicle motion induced longitudinal and lateral motion as well as yaw rate coming from a single track model (refer to FIG. 2). The thereby predicted a priori 3D world position is compared to the measured a posteriori 3D world position in the next time step [previous time step]. The 3D residuum of the comparison between 3D warped and real 3D world position marks all dynamic scene elements. The residuum is given in metric world coordinates (i.e., a 3D object motion induced position change). In order to get the corresponding pixel position (u,v) of the detected dynamic object (for a given 3D world position X,Y,Z) a pin hole camera model can be used.
In the following, the 3D Warping approach is described in detail, distinguishing 4 processing steps as visualized in FIG. 3.
a) Computing the Measured Cue
The approach described here uses dense 3D data as input. In this context, “dense” means that for the whole scene 3D data exists. To this end, any dense depth sensors can be used, as for example, a stereo camera system, a Photonic Mixer Device [4] or a dense laser scanner [5]. Based on these sensors, the X, Y, and Z-maps (i.e., depth map) can be computed. In the following the information of the X, Y, and Z-maps will be transformed into a 3D scene representation (see glossary).
b) Computing the Predicted Cue
The computation can be done in different ways and combinations, regarding the amount of processed data. Three different computational methods were proposed in [12], which are iconic-based, voxel-based, and envelope-based computation.
The first computational method runs completely iconic (i.e., all 3D voxels are handled independently). More specifically, this means that the known 3D positions of all known points in the environment are adapted taking the 3D vehicle ego motion into account. The adapted 3D position is used to 3D warp each point of the 3D representation (see glossary) independently. Then the predicted (3D warped) and the measured scene representations (see glossary) are compared to determine dynamic objects, which can be done by any distance metric in 3D space.
The second computational method builds up a 3D voxel cloud (i.e., cloud of 3D segments) of the scene. Different from the first iconic approach a region based post-processing and modeling within the voxel cloud takes place by use of scene models (see glossary). Thereby information from neighboring voxels is propagated and geometric 3D object models are introduced, which correct outlying voxels. These measures improve the overall accuracy of the approach.
The third computational method reduces the problem complexity by restricting the processing to one (or a few) surface(s) in the environment. In the car domain this could be the road surface. Only scene elements on this surface are considered. Based on this information an envelope is build up, which is called environmental envelope (see glossary), reducing the complexity of the problem and allowing efficient heuristics for post-processing.
Additionally, the combination of the computational methods is possible, as e.g. using a number of 3D voxel clouds with the scene models for 3D Warping and the remaining voxels are iconic 3D warped. Also, the amount of processed data can be affected in that way.
c) Computing Residuum
Computing the difference (residuum) between the measured cue and the predicted cue results in residuum regions in 3D that contain values at positions where dynamic objects are present. Moreover, from the residuum regions the relative motion of the dynamic object in 3D coordinates can be derived. The residuum computation can be done by any distance metric in 3D space.
For methods 1 and 2 the residuum regions define image regions (by back projecting with a pin-hole camera model using 3D to 2D mapping) that hold dynamic objects as well as the magnitude of the object motion in X, Y, and Z direction. For method b)3 the residual environmental envelope defines the motion of dynamic objects in X and Z direction only (height Y is defined as constant over the whole environmental envelope). For determining the corresponding image position all found dynamic objects are mapped from 3D (X,Y=const,Z) to the 2D image surface.
d) Post-Processing
In order to handle artifacts, the described procedure might produce, morphological operations on the residuum's are carried out (see [10] for details on such morphological operations). This assures that only larger residuum regions are interpreted as being dynamic. Furthermore, by including vehicle domain specific context (top-down knowledge), all objects that are known to be static (e.g., found road segments) can be sorted out, easing the 3D Warping procedure. Additionally, data from a satellite-based navigation system can be incorporated providing further knowledge of the scene, e.g., 3D GPS position for static scene content).
As described in [11], the mentioned computation methods (in part “b) Computing the predicted cue”) have different advantages and drawbacks making them more or less applicable in different domains and applications, as summarized in the following Table 1. Table 2 summarizes the differences between existing pixel-based 2D Warping procedures and the 3D Warping approach on 3D coordinates.
TABLE 1Comparison of the proposed computation methods fordetecting dynamic objectsRegion basedIconic3D WarpingOptical flow based3DVoxel cloud(environmental2D WarpingWarping3D Warpingenvelope)AdvantagesNo camera parametersAccurateVeryFast(calibration)accuratenecessaryRelative motion of detected objects in worldNo further sensorscoordinates is determinednecessaryDynamic object moving in the direction of theego vehicle can be detectedDrawbackOnly objects withSlowLower accuracyhorizontal motionNeeds dense depth data (vision based, PMD orcomponentslaser scanner), the denseness and quality ofSlowsuch depth data could be improved by satellite-Low accuratenessbased navigation data, e.g., 3D GPS positionfor static scene content).
TABLE 2Conceptional differences between pixel based 2DWarping with optical flow and 3D Warping on 3D coordinatesOptical flow based 2D Warping3D Warping onCriterion(Proper Object Motion)3D coordinatesInput dataConsecutive images of a monocularDense 3D datacamera, dense 3D dataComputationalPixels on the image3D world positionslevelDetectableLateral object motion (i.e.,Longitudinal and to a certainobject motionorthogonal to the motion of egoextend lateral object motioncamera vehicle)(i.e., motion in the direction ofthe ego camera vehicle andorthogonal to it).Output dataDetected object motion in pixelsDetected object motion in 3Don the image plane between twocoordinatesconsecutive imagesInvention
The invention proposes an improved method for object motion detection using 3D data and visual information.
This object is achieved by means of the features of the independent claims. The dependent claims develop further the central idea of the present invention.
According to a first aspect, a method for detecting dynamic objects in the (visually) sensed scene of a driver assistance system of a vehicle with ego-motion, comprises the steps of:                providing visual signals of the environment of a vision sensor with ego-motion,        detecting the proper motion (see glossary) of objects on the input field of the vision sensor based on the detected optical flow,        detecting the motion of objects based on a 3D representation model (see glossary) of the environment and using a 3D Warping on the basis of predicted and sensed 3D data, the predicted 3D data being generated based on measured 3D data and data representing the ego-motion, and        combining the 3D Warping-based and the optical-flow-based object motion detection, and        storing information on the detected dynamic objects and their measured motion parameters.        
The information on the detected dynamic objects can be used for collision avoidance or path planning.
The 3D Warping-based and the optical-flow-based object motion detection may be combined such that the search space of the respectively other detection is restricted and detection parameters are supplied to the respectively other detection. Also, a combination of both approaches is possible, when directly comparing and weighting the results of the two (see FIG. 5 for a system overview).
The 3D-Warping-based object motion detection may parameterize, the optical-flow-based object motion recognition, e.g. depending on the motion direction and amplitude of objects.
More specifically, only those regions in the 3D-Warping-based object motion detection, which indicate a lateral motion relative to the ego-motion may be processed selectively by the optical-flow-based object motion recognition.
Also, the optical-flow-based object motion detection can provide regions with possible lateral motion, where the 3D Warping is applied. The lateral motion is indicated by an aura around a moving object, due to the change in size of the object (see FIG. 6 last column).
The 3D Warping-based and the optical-flow-based object motion detection can be run in parallel and may be combined such that the detection results of one approach are refined and verified by the respectively other detection method.
Information from a 3D depth sensor, such as e.g. a rotating laser scanner can be used to generate sensor signals.
Information from a 2D depth sensor such as e.g. Photonic Mixer Device may be used to generate sensor signals.
Information from a 1D depth sensor (e.g., laser scanner) can be used, together with 2D vision signals, to generate 3D data.
A dense depth sensor may provide 3D data and information of the ego-motion computed based on a single track model.
Input data for the single track model can come from additional sensors for vehicle velocity and/or yaw-rate.
Both methods may run in parallel.
A further aspect of the invention relates to a driver assistance computing unit, designed to carry out a method as explained above.
The invention also proposes a vehicle being equipped with such a driver assistance computing unit.
Another aspect relates to an autonomous robot being equipped with such a computing unit.
A driver assistance system with a driving path and/or surrounding model generation apparatus, the model generation apparatus may comprise:                means for providing visual signals of the environment of a vision sensor with ego-motion,        computing means detecting the proper motion of objects on the input field of the vision sensor based on the detected optical flow,        computing means for detecting the motion of objects based on a 3D representation model of the environment and using a 3D Warping on the basis of predicted and sensed 3D data, the predicted 3D data being generated based on measured 3D data and data representing the ego-motion, and        computing means for combining the 3D Warping-based and the optical-flow-based object motion detection, and        means for storing information on the detected dynamic objects and their measured motion parameters.        
The vision sensor may be a dense depth sensor providing 3D data and information on ego-motion.
The vision sensor can be accessible on a CAN bus and/or an additional sensor.
The invention combines the two approaches (3D Warping and Proper Object Motion “POM”) in order to exploit their respective advantages while resolving the existing drawbacks. The combination of the two named approaches is of importance when aiming at a real-time implementation of an ADAS in a vehicle. In general, there are five possible combinations: First, both approaches work in parallel and one of the approaches results are verified and refined by the respectively other one (two combinations). Second, the 3D Warping defines regions where the POM is applied and additionally the POM can also be parameterized in an optimal way to reduce the computational requirements (two combinations). Finally, the POM defines regions where the 3D Warping is applied, but this combination is not threaded here in detail, but mentioned for completeness.
The invention thereby improves the rather broad and generic results gathered by the 3D Warping approach using a well-parameterized form of POM. Based on this a refinement and verification of the 3D Warping detection results can be achieved (see FIG. 5 for a descriptive overview of the invention and table 3 for combinations between 3D Warping (3DW) and optical-flow-based 2D Warping (POM)). In the following, the proposed combination approach will be explained in more detail.
TABLE 3Possible combinations between pixel based 2DWarping with optical flow and 3D Warping on 3D coordinatesOptical-flow-basedPOM results definePOM in parallelregions for 3D3D Warping results defineCriterionwith 3D WarpingWarpingregions for POMRequiredVery high to high,Very high, due toMedium, optical flow is onlyComputationaldue to paralleloptical flowcomputed on certain regions.powercomputation, butcomputation on alldepends on opticalscales, directions andflow parameters.ranges.ComputationalPixels on the imageFirst Pixels on theFirst 3D world positions,leveland parallel 3D worldimage, second 3Dsecond pixels on the image forpositions.world positions.defined areas.DetectableLateral andFirst, Lateral objectFirst, Longitudinal and to aobject motionLongitudinal objectmotion and possiblecertain extend lateral objectmotion.areas of longitudinalmotion.motion.Second, refinement of LateralSecond, Longitudinalobject motion.and to a certainextend lateral objectmotion.Output dataDetected objectDetected objectDetected object motion in 3Dmotion in pixels onmotion in pixels oncoordinates and for refinedthe image planethe image planeareas object motion in pixelsbetween twobetween twoon the image plane betweenconsecutive imagesconsecutive imagestwo consecutive images.and Detected objectand for refined areasmotion in 3Dobject motion in 3Dcoordinates.coordinates.3D Warping Results Define Regions for POM Computation
In order to detect dynamic objects, the 3D Warping approach computes the residuum between predicted and measured 3D coordinates (please refer also EPO patent application 09 150 710.3 (see [11]) for details). The residuum shows high values at scene points that contain dynamic objects. These high values are ordered in specific patterns dependent on the type of object motion (lateral, longitudinal, mixed, see FIG. 6 for visualization on the image plane).
While the trajectory of longitudinal object motion can be inferred rather directly from its specific longitudinal residuum pattern, lateral motion components need to be inferred indirectly from its specific lateral pattern. Here the proposed combination of the 3D Warping with POM has particular advantages. More specifically, patterns that indicate lateral motion can be post-processed by POM. For this purpose, the computationally intensive POM can be restricted to a certain image area based on the 3D Warping results. Furthermore, the POM is parameterized optimally based on these patterns (the direction and expected amplitude of the optical flow is heavily restricted). Based on POM, the correspondence problem of the 3D Warping residuum for lateral motion is solved and a 3D object trajectory can be computed.
Furthermore, the POM can be used for a verification of the noisy 3D Warping results. The noise is due to error-prone depth measurements that typically contain numerous artifacts, which result in false-positive object motion measurements. Additionally, the resolution of the residuum is typically rather small, which is sufficient for the detection of a certain pattern, but potentially insufficient for inferring the specific object motion trajectory. To this end, POM can be used for the refinement of the motion trajectory.
Summarizing, the described combination approach leads to a number of advantages that improve existing approaches in the following points:    The optical-flow-based POM approach is in general resource demanding. The 3D Warping is used to preselect the parts of the scene, on which the POM is computed on. This reduces the computation time and increases the quality.    Furthermore, the POM is parameterized optimally based on the 3D Warping results, which lowers the computational demands of the POM-based verification process. More specifically, such parameters and optimization can be a local restriction in the search direction of the optical flow, a restricted search range in pixels, restrictions in the used scales, restrictions regarding the temporal integration of flow vectors, restrictions regarding the spatial integration of flow vectors, restrictions or relaxations regarding used confidence thresholds that support flow values e [17] for the existing parameters of a state-of-the-art, correlation-based optical flow computation method)    Different from pure optical flow-based approaches that detect object motion on the image plane (i.e., in pixels in the perspective image), the 3D Warping approach, delivers information of the object motion in 3D world coordinates (i.e., direction and magnitude in meters). However, also the detected object motion in pixels in the image delivers precious information for typical image processing algorithms. The here proposed algorithm combines the prediction of pixel-wise motion of image pixels (called image Warping) and the prediction of 3D world coordinates directly.    Optical flow-based approaches are restricted to the detection of object motion that is orthogonal to the ego vehicle motion (lateral motion). In the opposite, 3D Warping is targeted at the detection of longitudinal object motion. Different from that the proposed approach combines both, allowing the precise detection of object motion in lateral and longitudinal direction in an optimal way.3D Warping and Proper Object Motion Run in Parallel
There are two combinations possible, first the 3D Warping results are verified and refined by POM and second the other way around.
Starting with POM results are verified and refined by 3D Warping results (POM and 3D warping run in parallel and are integrated at a later stage, which improves the conciseness of the results). Dense (i.e. it is computed for all image pixels) optical flow has the problem of being noisy and error prone at image regions with a low amount of structure (correspondence search becomes ambivalent). Although the gathered confidence measure (as a by-product of the optical flow computation) might be high, the flow vectors might still be wrong. In this case a late combination of the results of (the independently running) 3D Warping might be used to verify the POM results. Furthermore, since the optical flow is resource demanding and is hence computed on a low resolution, the gathered 3D Warping results (usually available on a higher resolution) can be used to refine (i.e. improve the resolution of) the flow vectors.
For the second combination the 3D Warping results are verified and refined by the POM results (POM and 3D warping run in parallel and are integrated at a later stage, which improves the conciseness of the results). Following EP 09 15 0710 and the introductory part of invention EP 09 16 1520 (see Tab. 2) the 3D Warping is especially suitable for the detection of longitudinal motion (as the preferred motion type in the traffic domain). In case of lateral object motion or a combination of lateral and longitudinal motion a correspondence problem might arise (which 3D clusters in the 3D residuum belong together). Here the POM results might be helpful, since these are most suitable for the detection of lateral object motion.
It should be understood that the foregoing relates only to embodiments of the invention and that numerous changes and modifications made therein may be made without departing from the spirit and the scope of the invention as set forth in the following claims.