1. Technical Field
The present invention relates to processing image motion from a moving camera mounted in a host vehicle for detection of hazards.
2. Description of Related Art
During the last few years camera based driver assistance systems (DAS) have been entering the market; including lane departure warning (LDW), automatic high-beam control (AHC), traffic sign recognition (TSR), forward collision warning (FCW) and pedestrian detection.
A core technology behind forward collision warning (FCW) systems and headway distance monitoring is detection and class-based recognition including vehicles and pedestrians. A key component of a typical forward collision warning (FCW) algorithm is the estimation of distance from a camera and the estimation of scale change from the time-to-contact/collision (TTC) as disclosed for example in U.S. Pat. No. 7,113,867.
Reference is now made to FIGS. 1 and 2 which illustrate a driver assistance system (DAS) 16 including a camera or image sensor 12 mounted in a vehicle 18. Image sensor 12, imaging a field of view in the forward direction provides image frames 15 in real time which are captured by an image processor 30. Processor 30 may be used to process image frames 15 simultaneously and/or in parallel to serve a number of driver assistance systems/applications. By way of example in FIG. 2, image frames 15 are used to serve pedestrian detection 20, lane departure warning (LDW) 21, forward collision warning (FCW) 22. Processor 30 may be used to process image frames 15 to detect and recognize an image of an object, e.g. vehicle or pedestrian, in the forward field of view of camera 12. Lane departure warning (LDW) 21 may provide a warning in the case of unintentional lane departure. The warning may be given when the vehicle crosses or is about to cross the lane marker. Driver intention is determined based on use of turn signals, change in steering wheel angle, vehicle speed and brake activation. Driver assistance system (DAS) 16 also includes real time hazard detection 23 by sensing the vertical deviation of the contour of the road or deviation from the road plane such as according to teachings of US patent application publication US20150086080. Driver assistance system (DAS) 16 may be implemented using specific hardware circuitry (not shown) with on board software and/or software control algorithms in storage 13. Image sensor 12 may be monochrome or black-white, i.e. without color separation or image sensor 12 may be color sensitive. In some cases, image frames 15 are partitioned between different driver assistance applications and in other cases the image frames 15 may be shared between the different driver assistance applications.
Reference is now made to FIG. 3 which illustrates camera or pinhole projection which relates a point P(X,Y,Z) in world space Cartesian coordinates to a point p(x,y) image coordinates on image plane 15 where X is the horizontal Cartesian coordinate in world space, Y is the vertical Cartesian coordinate in world space and Z is the direction along the optical axis of camera 12. The origin O of camera projection is at the pinhole, image plane 15 is in reality is behind the origin at focal length f with the image inverted. Image plane 15 is shown in the projection of FIG. 3 in a symmetric position with a non-inverted image in front of origin O at a distance focal length f. The equations that follow, approximate the relation between image coordinates x, y and world space coordinates X, Y, Z assuming camera or pinhole projection.
                    x        =                  f          ⁢                      X            Z                                              (        1        )                                y        =                  f          ⁢                      Y            Z                                              (        2        )            
The term “homography” as used herein refers to an invertible transformation from a projective space to itself that maps straight lines to straight lines. In the field of computer vision, two images of the same planar surface in space are related by a homography assuming the pinhole camera model.
Structure-from-Motion (SfM) refers to methods for recovering three-dimensional information of a scene that has been projected onto the back focal plane of a camera. The structural information derived from a SfM algorithm may take the form of a set of projection matrices, one projection matrix per image frame, representing the relationship between a specific two-dimensional point in the image plane and its corresponding three-dimensional point. SfM algorithms rely on tracking specific image features from image frame to image frame to determine structural information concerning the scene.
Reference is now made to FIG. 4, a flow chart showing details of processing of image motion or Structure-from-Motion (SfM) algorithm, according to US patent application publication US20150086080. US patent application publication US20150086080 is incorporated herein by reference as if fully set forth herein.
It is assumed that a road can be modeled as an almost planar surface. Thus imaged points of the road move in image space according to a homography.
In particular, by way of example, for a given camera 12 height (1.25 m), focal length (950 pixels) and vehicle motion between frames (1.58 m), it may be possible to predict the image motion of selected corresponding points on the road plane between the two image frames 15a and 15b respectively as host vehicle 18 moves forward. Using a model of the almost planar surface for the motion of the road points, it is possible to warp the second image 15b towards the first image 15a. Thus, in step 501, image frame 15b is initially warped into image frame 15a. (In a similar process, image frame 15a may be initially warped into image frame 15b).
Instead of trying to find feature points, which would invariably give a bias towards strong features such as lane marks and shadows, a fixed grid of points is used for tracking (step 507). A grid of points is selected (step 503) from a region, e.g. trapezoidal, that roughly maps up to 15 meters ahead and one lane in width. Points may be spaced every 20 pixels in the horizontal (x) direction and 10 pixels in the vertical (y) direction. An alternative would be to randomly select points according to a particular distribution.
Around each point in image 15a a patch is located (step 505). The patch may be 8 pixels in each direction centered around the point resulting in a 17×17 pixel square. The normalized correlation is then computed (e.g. Matlab™ function normxcorr2) for warped image 15b, where the patch center is shifted in the search region. In practical use system 16 may include be a yaw sensor but no pitch sensor and so a tighter search region may be used in the x direction rather than in the y direction. A search region of (2×4+1) pixels in the x direction may be used and (2×10+1) pixels in the y direction.
The shift which gives the maximum correlation score is found and may be followed by a refinement search around the best score position with a sub-pixel resolution of 0.1 pixels. Invalid tracks may be filtered out at the search stage by picking those points with a score above a threshold (e.g. T=0.7) leaving tracked points 509 as a result of tracking (step 507) and that the reverse tracking from warped image 15b to image 15a gives a similar value in the opposite direction. Reverse tracking is similar to left-right validation in stereo.
Tracked points 509 as a result of tracking step 507, are fit to a homography (step 511) using RANdom SAmple Consensus (RANSAC). A number, e.g. four, of points are chosen at random and used to compute the homography. Points 509 are then transformed using the homography and the number of points which are closer than a threshold are counted. Randomly choosing 4 points and counting the number of points which are closer than a threshold may repeated many times and the four points that gave the highest count are retained.
At the end of process 40, the four best points are used to again (step 513) transform the points and all the points (inliers) that are closer than a (possibly different) threshold are used to compute a homography using least squares. The rest of the points that are not closer than a (possibly different) threshold are considered outliers.
At this point in process 40, the number of inliers and their spread in the warped image give an indication to the success of finding the road plane model. It is usual to get over 80% inliers and a good fit. The homography can then be used to correct the initial alignment for warping (step 501). Correction of the initial alignment can be done by integrating the correction into the initial warp (step 501) or to do two warps consecutively. The former is advantageous as it requires only one interpolation step and can be performed optionally by matrix multiplication of the two homography matrices.
After warping image 15b towards image 15a to give warped image, using the refined warp (step 513), the tracking of points (step 507) may be repeated using a finer grid (e.g. every 5th pixel on every 5th row) and over a wider region of the road. Since the road plane is very well aligned, a smaller region may be searched over such as 2 pixels in each direction, again, with a subpixel search.
Using an image flow analysis between tracked image points 509 as described in US20150086080 or other optical flow analysis algorithms, points on the road have a characteristic positive image flow as host vehicle 18 moves forward. Positive image flow is defined as flow away from the focus of expansion (FOE) (generally speaking down and outwards in image frames 15).
Object points in world space above the road plane such as an elevated sidewalk have an image flow greater than the characteristic image flow of tracked points 509 of the road. Object points in world space below the road plane such as sunken manhole covers, have an image flow less than the characteristic image flow of tracked points 509 of the road.
Using the image flow analysis between tracked image points 509 as described in US20150086080, image flow of tracked points 509 is compared with the expected image flow of the modeled road plane and any differences or residual image flow are associated with vertical deviation in the road. Tracked image points 509 of objects above the road plane have residual image flow greater than zero and objects below the road plane have residual image flow below zero.