1. Technical Field
The present disclosure relates to a method for calibration stereo cameras and in particular for use inside a vehicle as part of a driver assistance system.
2. Description of Related Art
Stereo vision is the process of recovering depth from camera images by comparing two or more views of the same scene. Binocular stereo uses two images, taken with cameras that are separated by a horizontal distance known as the “baseline”. Calibrating the stereo camera system allows computation of three-dimensional world points in actual units, e.g. millimeters, relative to the cameras based on the image coordinates.
Calibration of a stereo camera system involves the estimation of extrinsic parameters which describe translation and rotation of the second camera relative to the first camera and intrinsic parameters of each camera. Intrinsic parameters include focal lengths, principal points and other parameters which describe camera image distortion. Image distortion means that image points are displaced from the position predicted by an ideal pinhole projection model. The most common form of distortion is radial distortion, which is inherent in all single-element lenses. Under radial distortion, e.g. pincushion distortion and/or barrel distortion, image points are displaced in a radial direction from the image center.
Different sources of information can be used to obtain camera calibration. One approach (sometimes called “off-line” calibration) is to use a known target where the three-dimensional world coordinates (or locations in three-dimensional space) of respective multiple points are known. One such option may use a checkerboard with known square size at a known location in world coordinates. Such calibration techniques require special equipment anchor a special procedure that is time consuming and costly.
Cameras for use in driver assistance and/or driving control may be mounted viewing in the forward direction inside a vehicle behind the windshield. Stereo calibration for stereo cameras mounted behind the windshield is thus further complicated; since the windshield distorts the perspective or camera projection, the calibration may be performed only after installing the cameras in the host vehicle. Cameras are generally modelled using the pinhole camera model using perspective projection. This model is a good approximation to the behavior of most real cameras, although in some cases it can be improved by taking non-linear effects (such as radial distortion) into account.
Auto-calibration or self-calibration refers to a technique in which the camera parameters are updated “on-line” by processing images being captured during motion of the vehicle. In automotive applications, auto-calibration may insure maintenance-free long-term operation, since camera parameters may be subject to drift due mechanical vibrations or large temperature variations that are commonly encountered in automotive applications. Additionally, reliable auto-calibration techniques may render obsolete initial off-line calibration, thus reducing time and cost in the production line.
Thus there is a need for and it would be advantageous to have a method for auto-calibration stereo cameras suitable for driver assistance and or driving control applications in automobiles.
Structure-from-Motion (SfM) refers to methods for recovering three-dimensional information of a scene that has been projected onto the back focal plane of a camera. The structural information derived from a SfM algorithm may take the form of a set of projection matrices, one projection matrix per image frame, representing the relationship between a specific two-dimensional point in the image plane of the camera and its corresponding three-dimensional point in world space. Alternatively, the structure information is the depth or distance to the three-dimensional (3D) point P=(X,Y,Z) which projects onto the image plane at the two-dimensional (2D) point p=(x,y). SfM algorithms rely on tracking specific image features from image frame to image frame to determine structural information concerning the scene. Structure-from-Motion (SIM) techniques useful in driver assistance applications have been previously disclosed by the present Applicant in US patent application publication 2014/0160244 entitled: Monocular Cued Detection of three-dimensional Structures from Depth Images, which is included herein by reference. US patent application publication 2014/0160244 discloses a system mountable in a host vehicle including a camera connectible to a processor. Multiple image frames are captured in the field of view of the camera. In the image frames, an imaged feature is detected of an object in the environment of the vehicle. The image frames are portioned locally around the imaged feature to produce imaged portions of the image frames including the imaged feature. The image frames are processed to compute a depth map locally around the detected imaged feature in the image portions. The depth map may be represented by an image of the feature with a color or grayscale coordinate related to a function of distance from the camera to the object. Using the camera projection and known camera intrinsic and extrinsic parameters relative to a world coordinate system, the depth map is sufficient to provide the three-dimensional world coordinates of the imaged feature.
The computation of depth maps from multiple images, either from a motion time sequence and/or from multiple cameras is the subject of extensive research and numerous systems have been demonstrated.