Structure-from-Motion (SfM) is a collection of methods for recovering 3D information of a scene that has been projected on to the planar 2D film back plane of a camera. The structural information derived from a SfM algorithm typically takes the form of a set of projection matrices, one projection matrix per image frame, representing the relationship between a specific 2D point in the image plane and its corresponding 3D point. SfM algorithms rely on tracking specific image features to determine such structural information concerning the scene. Generally speaking only a small percentage of an image can be accurately tracked—these points usually lie on edges and corners where sharp intensity discontinuities provide unambiguous tracking cues.
Similarly, stereo or multi-ocular disparity analysis may be used to determine 3D points from 2D images. As with SfM analysis, 3D points can only be established for a small percentage of an image at locations where there is sufficient contrast to unambiguously determine correspondences with a second image.
In many applications including, but not limited to stereoscopic image rendering, robotic navigation and special effects animation, such sparse 3D points are insufficient. Such applications require a dense depth map in which each 2D point in an image is associated with a 3D point.
Prior art for conversion of sparse 3D points to dense depth maps relies on either spatial interpolation of the sparse 3D data or hypothesise-and-test approaches such as the RANSAC algorithm. Both these approaches only use the sparse 3D point data available at each individual image frame. This leads to two major shortcomings—first, the number of sparse points available in any single image may not be sufficient to accurately derive a dense depth map and secondly, the consistency of the depth maps from one frame to the next may be poor. The present invention discloses a method for deriving dense depth maps from sparse 3D data that addresses these shortcomings.
The applicants have disclosed in co-pending PCT application number PCT/AU01/00975, the contents of which are herein disclosed by reference, a method for generating depth maps from one or more images. This method involved a two step process. In the first step sparse depth data associated with a single image was used to generate a depth map for the image. In the second phase depth maps for each image in an image sequence were generated using the results generated in phase one. Whilst this method works in ideal situations, there are many limitations to the process. In the applicants prior application it was necessary to select a number of key frames in an image sequence. For each of these key frames it was necessary to know the depth data for a sufficient number of pixels within that key frame such that an equation to generate a corresponding depth map could be generated. That is, given the depth for a sufficient number of pixels within the key frame, a function could be derived such that the depth for every other pixel could be determined. Once these functions were generated for the key frames they could then be used to in turn generate functions for the remaining frames.
One of the limitations of the applicants prior process is the necessity for two phases. It will be appreciated that if an error is introduced in the first phase for whatever reason, then this error is propagated throughout the second phase. In such a situation the resultant depth maps may not be satisfactory.
Of greater concern is that for phase one to be completed satisfactory, it is necessary to know the depth for a sufficient number of pixels within a key frame, in order to solve an equation to generate the depth map for that key frame. For example, if a key frame has 350,000 pixels then ideally the depth for 17,500 pixels (or 5% of the total number of pixels) would be known so as to enable a function for the depth map to be generated. If the number of pixels for which the depth is known is not sufficient, the quality of the resulting depth map will not be adequate. If unable to generate an accurate depth map for a key frame, then it is unlikely that phase two will be able to be completed successfully. There is therefore a need for a simplified process for the generation of depth maps.