Various techniques are available for obtaining three-dimensional (3D) information from two-dimensional (2D) images captured sequentially by a single camera as it moves. One such technique is described in the paper “Shape and motion from image streams under orthography—a factorization method”, by C. Tomasi and T. Kanade, Technical Report TR-92-1270, Cornell University, March 1992. This technique basically involves identifying points in the captured images that appear to represent the same real world features. Changes in the location of the points between respective images can then be used to deduce the motion of the camera and to triangulate the position of the real world features in 3D space. Although computationally expensive, the techniques generally work well. However, one limitation is that it is impossible to know the scale of the 3D structure that is generated without knowledge of the size of at least one real world feature for comparison purposes.
This sort of technique has been applied in the field of vehicle camera systems to obtain information about a vehicle's surroundings. For example, in U.S. patent publication 2009/0243889, a free parking space detection system is described, in which a 3D structure generated from images captured by a single camera mounted on a vehicle is used to estimate positions of adjacent vehicles. In U.S. publication 2009/0243889, the scale of the generated 3D structure is determined by detecting a ground plane in the 3D structure, estimating the height of the camera above the ground plane within the 3D structure, and comparing this height with the height of the camera in the real world, which is assumed to be fixed, to calculate a “camera height ratio” between the generated 3D structure and the real world. The camera height ratio represents the scale of the 3D structure.
The ground plane is identified by first ensuring that the 3D structure is properly oriented in an XYZ coordinate system, with Y representing the vertical. The ground plane is then estimated to be at the location along the Y axis having the highest density of identified real world features. A RANdom SAmple Consensus (RANSAC) method is then applied to refine the location of the ground plane. The RANSAC method iteratively selects a subset of the identified real world features near the estimated ground plane for inclusion in determination of a refined ground plane, with features in the selected subset generally being known as inliers and features not in the selected subset being known as outliers.
One problem with the technique described in U.S. publication 2009/0243889, is that the surface on which a vehicle stands, e.g. the surface of a road, is generally fairly uniform and featureless. It is also likely that other horizontal surfaces may be present in the images at different heights to the surface on which the vehicle stands, such as a pavement or the bodywork of another vehicle. This means that the assumption that the ground plane is located along the Y axis where the density of identified features is at a maximum may often be wrong, and other surfaces may be falsely identified as the surface on which the vehicle is standing. If the ground plane is identified incorrectly, the calculated camera height ratio will be incorrect and the scale of the generated 3D structure inaccurate. In the free parking space section method described in U.S. publication 2009/0243889, this may lead to distances to other objects within the generated 3D structure being wrong, with the result that parking spaces may be misidentified and, in the absence of other safeguards, vehicle collisions may occur.