Automotive accidents are a major cause of loss of life and property. It is estimated that over ten million people are involved in traffic accidents annually worldwide and that of this number, about three million people are severely injured and about four hundred thousand are killed. A report “The Economic Cost of Motor Vehicle Crashes 1994” by Lawrence J. Blincoe published by the United States National Highway Traffic Safety Administration estimates that motor vehicle crashes in the U.S. in 1994 caused about 5.2 million nonfatal injuries, 40,000 fatal injuries and generated a total economic cost of about $150 billion.
To cope with automotive accidents and high cost in lives and property several technologies have been developed. One camera technology used in the vehicle is a visible light (VIS) camera (either CMOS or CCD), the camera being mounted inside the cabin, typically near the rearview mirror and looking forward onto the road. VIS light cameras are used in systems for Lane Departure Warning (LDW), vehicle detection for accident avoidance, pedestrian detection and many other applications.
Before describing prior art vehicular systems based on VIS cameras and prior art vehicular systems based on FIR cameras, the following definitions are put forward. Reference is made to FIG. 1 (prior art). FIG. 1 is a schematic view of a road scene which shows a vehicle 50 having a system 100 including a VIS camera 110 and a processing unit 130. Vehicle 50 is disposed on road surface 20, which is assumed to be leveled.
The term “following vehicle” is used herein to refer to vehicle 50 equipped with camera 110. When an obstacle of interest is another vehicle, typically traveling in substantially the same direction, then the term “lead vehicle” or “leading vehicle” is used herein to refer to the obstacle. The term “back” of the obstacle is defined herein to refer to the end of the obstacle nearest to the following vehicle 50, typically the rear end of the lead vehicle, while both vehicles are traveling forward in the same direction. The term “back”, in rear facing applications means the front of the obstacle behind host vehicle 50.
The term “ground plane” is used herein to refer to the plane best representing the road surface segment between following vehicle 50 and obstacle 10. The term “ground plane constraint” is used herein to refer to the assumption of a planar ground plane.
The terms “object” and “obstacle” are used herein interchangeably.
The terms “upper”, “lower”, “below”, “bottom”, “top” and like terms as used herein are in the frame of reference of the object not in the frame of reference of the image. Although real images are typically inverted, the wheels of the leading vehicle in the imaged vehicle are considered to be at the bottom of the image of the vehicle.
The term “bottom” as used herein refers to the image of the bottom of the obstacle, defined by the image of the intersection between a portion of a vertical plane tangent to the “back” of the obstacle with the road surface; hence the term “bottom” is defined herein as image of a line segment (at location 12) which is located on the road surface and is transverse to the direction of the road at the back of the obstacle.
The term “behind” as used herein refers to an object being behind another object, relative to the road longitudinal axis. A VIS camera, for example, is typically 1 to 2 meters behind the front bumper of host vehicle 50.
The term “range” is used herein to refer to the instantaneous distance D from the “bottom” of the obstacle to the front, e.g. front bumper, of following vehicle 50.
The term “Infrared” (IR) as used herein is an electromagnetic radiation of a wavelength longer than that of visible light. The term “far infrared” (FIR) as used herein is a part of the IR spectrum with wavelength between 8 and 12 micrometers. The term “near infrared” (NIR) as used herein is a part of the IR spectrum with wavelength between 0.7 and 2 micrometers.
The term “Field Of View” (FOV; also known as field of vision) as used herein is the angular extent of a given scene, delineated by the angle of a three dimensional cone that is imaged onto an image sensor of a camera, the camera being the vertex of the three dimensional cone. The FOV of a camera at particular distances is determined by the focal length of the lens: the longer the focal length, the narrower the field of view.
The term “Focus Of Expansion” (FOE) is the intersection of the translation vector of the camera with the image plane. The FOE is the commonly used for the point in the image that represents the direction of motion of the camera. The point appears stationary while all other feature points appear to flow out from that point. FIG. 4a (prior art) illustrate the focus of expansion which is the image point towards which the camera is moving. With a positive component of velocity along the optic axis, image features will appear to move away from the FOE and expand, with those closer to the FOE moving slowly and those further away moving more rapidly.
The term “baseline” as used herein is the distance between a pair of stereo cameras used for measuring distance from an object. In the case of obstacle avoidance in automotive applications, to get an accurate distance estimates from a camera pair a “wide baseline” of at least 50 centimeters is needed, which would be ideal from a theoretical point of view, but is not practical since the resulting unit is very bulky.
The world coordinate system of a camera 110 mounted on a vehicle 50 as used herein is defined to be aligned with the camera and illustrated in FIG. 3 (prior art). It is assumed that the optical axis of the camera is aligned with the forward axis of the vehicle, which is denoted as axis Z, meaning that the world coordinates system of the vehicle is parallel that the world coordinate system of the camera. Axis X of the world coordinates is to the left and axis Y is upwards. All axes are perpendicular to each other. Axes Z and X are assumed to be parallel to road surface 20.
The terms “lateral” and “lateral motion” is used herein to refer to a direction along the X axis of an object world coordinate system.
The term “scale change” as used herein is the change of image size of the target due to the change in distance.
“Epipolar geometry” refers to the geometry of stereo vision. Referring to FIG. 4b, the two upright planes 80 and 82 represent the image planes of the two cameras that jointly combine the stereo vision system. OL and OR represent the focal points of the two cameras given a pinhole camera representation of the cameras. P represents a point of interest in both cameras. pL and pR represent where point P is projected onto the image planes. All epipolar lines go through the epipole which is the projection center for each camera, denoted by EL and ER. The plane formed by the focal points OL and OR and the point P is the epipolar plane. The epipolar line is the line where the epipolar plane intersects the image plane.
Vehicular Systems Based on Visible Light (VIS) Cameras:
Systems based on visible light (VIS) cameras for detecting the road and lanes structure, as well as the lanes vanishing point are known in the art. Such a system is described in U.S. Pat. No. 7,151,996 given to Stein et al, the disclosure of which is incorporated herein by reference for all purposes as if entirely set forth herein. Road geometry and triangulation computation of the road structure are described in patent '996. The use of road geometry works well for some applications, such as Forward Collision Warning (FCW) systems based on scale change computations, and other applications such as headway monitoring, Adaptive Cruise Control (ACC) which require knowing the actual distance to the vehicle ahead, and Lane Change Assist (LCA), where a camera is attached to or integrated into the side mirror, facing backwards. In the LCA application, a following vehicle is detected when entering a zone at specific distance (e.g. 17 meters), and thus a decision is made if it is safe to change lanes,
Systems and methods for obstacle detection and distance estimations, using visible light (VIS) cameras, are well known in the automotive industry. A system for detecting obstacles to a vehicle motion is described in U.S. Pat. No. 7,113,867 given to Stein et al, the disclosure of which is included herein by reference for all purposes as if entirely set forth herein.
A pedestrian detection system is described in U.S. application Ser. No. 10/599,635 by Shashua et al, the disclosure of which is included herein by reference for all purposes as if entirely set forth herein. U.S. application Ser. No. 11/599,635 provides a system mounted on a host vehicle and methods for detecting pedestrians in a VIS image frame.
A distance measurement from a VIS camera image frame is described in “Vision based. ACC with a Single Camera: Bounds on Range and Range Rate Accuracy” by Stein et al., presented at the IEEE Intelligent Vehicles Symposium (IV2003), the disclosure of which is incorporated herein by reference for all purposes as if entirely set forth herein. Distance measurement is further discussed in U.S. application Ser. No. 11/554,048 by Stein et al, the disclosure of which is included herein by reference for all purposes as if entirely set forth herein. U.S. application Ser. No. 11/554,048 provides methods for refining distance measurements from the “front” of a host vehicle to an obstacle.
Referring back to FIG. 1 (prior art), a road scene with vehicle 50 having a distance measuring apparatus 100 is illustrated, apparatus 100 including a VIS camera 110 and a processing unit 130. VIS camera 110 has an optical axis 113 which is preferably calibrated to be generally parallel to the surface of road 20 and hence, assuming surface of road 20 is level (i.e complying with the “ground plane constraint”), optical axis 113 points to the horizon. The horizon is shown as a line perpendicular to the plane of FIG. 1 shown at point 118 parallel to road 20 and located at a height Hcam of optical center 116 of camera 110 above road surface 20.
The distance D to location 12 on the road 20 may be calculated, given the camera optic center 116 height Hcam, camera focal length f and assuming a planar road surface:
                    D        =                              f            ⁢                                                  ⁢                          H              cam                                            y            bot                                              (        1        )            
The distance is measured, for example, to location 12, which corresponds to the “bottom edge” of pedestrian 10 at location 12 where a vertical plane, tangent to the back of pedestrian 10 meets road surface 20.
Referring to equation 1, the error in measuring distance D is directly dependent on the height of the camera, Hcam. FIG. 2 graphically demonstrates the error in distance, as a result of just two pixels error in the horizon location estimate, for three different camera heights. It can be seen that the lower the camera is the more sensitive is the distance estimation to errors.
Height Hcam in a typical passenger car 50 is typically between 1.2 meter and 1.4 meter. Camera 110 is mounted very close to the vehicle center in the lateral dimension. In OEM (Original Equipment Manufacturing) designs, VIS camera 110 is often in the rearview mirror fixture. The mounting of a VIS camera 110 at a height between of 1.2 meter and 1.4 meter, and very close to the vehicle center in the lateral dimension, is quite optimal since VIS camera 110 is protected behind the windshield (within the area cleaned by the wipers), good visibility of the road is available and good triangulation is possible to correctly estimated distances to the lane boundaries and other obstacles on the road plane 20. Camera 110 is compact and can easily be hidden out of sight behind the rearview mirror and thus not obstruct the driver's vision.
A VIS camera 110 gives good daytime pictures. However, due to the limited sensitivity of low cost automotive qualified sensors, VIS camera 110 cannot detect pedestrians at night unless the pedestrians are in the area illuminated by the host vehicle 50 headlights. When using low beams, the whole body of a pedestrian is visible up to a distance of 10 meters and the feet up a distance of about 25 meters.
Vehicular Systems Based on Visible Light (FIR) Cameras:
Far infrared (FIR) cameras are used in prior art automotive applications, such as the night vision system on the Cadillac DeVille introduced by GM in 2000, to provide better night vision capabilities to drivers. Typically, a FIR image is projected onto the host vehicle 50 windshield and is used to provide improved visibility of pedestrians, large animals and other warm obstacles. Using computer vision techniques, important obstacles such as pedestrians can be detected and highlighted in the image. Since FIR does not penetrate glass, the FIR camera is mounted outside the windshield and typically, in front of the engine so that the image is not masked by the engine heat. The mounting height is therefore typically in the range of 30 centimeters to 70 centimeters from the ground, and the camera is preferably mounted a bit shifted from center to be better aligned with the viewpoint of the vehicle driver.
Reference is made to FIG. 7a which exemplifies a situation where a pedestrian 10 is on the side-walk 30, which is not on the ground plane 20 of the host vehicle 50. Reference is also made to FIG. 7b which depicts an example of severe vertical curves in road 20 that place the feet of the pedestrian 10 below the road plane 20, as defined by host vehicle 50 wheels.
It should be noted that when a detected pedestrian 10 is on sidewalk 30, the sidewalk 30 upper surface being typically higher above road 20 surface, for example by 15 centimeters, as illustrated in FIG. 7a. With a camera height of 70 centimeters, the difference in height between sidewalk 30 upper surface and road 20 surface introduces an additional error in distance estimation, typically of about 15% to 20%. The error is doubled for a camera at 35 centimeters height. An error in distance estimation also occurs on vertically curved roads, where a pedestrian 10 may be below the ground plane 20 of the host vehicle 50, as exemplified by FIG. 7b. In other situations a pedestrian 10 may appear above the ground plane 20 of the host vehicle 50. In conclusion, the errors in the range estimation, when using of road geometry constraints, can be very large for a camera height of 70 centimeters, and the value obtained is virtually useless.
With a FIR system, the target bottom 12 can still be determined accurately, however it is difficult to determine the exact horizon position, since road features such as lane markings are not visible. Thus, in effect, the error in ybot can typically be 8-10 pixels, especially when driving on a bumpy road, or when the road curves up or down or when the host vehicle is accelerating and decelerating. The percentage error in the range estimate even for a camera height of 70 centimeters can be large (often over 50%). FIG. 2 graphically demonstrates the error in distance, as result of just 2 pixels error in the horizon location estimate, for three different camera heights. It can be seen that the lower the camera is the more sensitive is the distance estimation to errors.
In contrast with VIS cameras, FIR cameras are very good at detecting pedestrians in most night scenes. There are certain weather conditions, in which detecting pedestrians is more difficult. Since the main purpose of the FIR camera is to enhance the driver's night vision, the FIR camera is designed to allow visibility of targets at more than 100 meters, resulting in a FIR camera design with a narrow Field Of View (FOV). Since the camera mounting is very low, range estimation using the ground plane constraint and triangulation is not possible. The FIR camera is typically of low resolution (320×240). FIR cameras are quite expensive, which makes a two FIR camera stereo system not a commercially viable option.
Stereo Based Vehicular Systems:
Stereo cameras have been proposed for obstacle avoidance in automotive applications and have been implemented in a few test vehicles. Since visible light cameras are quite cheap it appears a reasonable approach to mount a pair of cameras at the top of the windshield. The problem is that to get accurate distance estimates from a camera pair, a “wide baseline” is needed, but a “wide baseline” results in a bulky unit which cannot be discretely installed. A “wide baseline” of 50 centimeters or more would be ideal from a theoretical point of view, but is not practical.
Definitions
The term “vehicle environment” is used herein to refer to the outside scene surrounding a vehicle in depth of up to a few hundreds of meters as viewed by a camera and which is within a field of view of the camera.
The term “patch” as used herein refers to a portion of an image having any shape and dimensions smaller or equal to the image dimensions.
The term “centroid” of an object in three-dimensional space is the intersection of all planes that divide the object into two equal spaces. Informally, it is the “average” of all points of the object.