Conventionally, computers display images in a standard two-dimensional format much like conventional photographic images. However, computer programmers have developed techniques over the years to create three-dimensional representations of objects for display on the computer. Unfortunately, these techniques tend to be cumbersome, complicated and difficult to implement on a routine basis.
One approach to creating a three-dimensional model is to start with two-dimensional images. The basic problem with creating a three-dimensional model in this manner is that of extracting the three-dimensional shape of the objects appearing in the sequence of two-dimensional images. The crux of the problem is that each two-dimensional image contains only a two-dimensional projection of the actual three-dimensional object, and in fact, may only contain a portion of the objects. Much work has been done on this to date.
Photogrammetry has been used to create three-dimensional models from two-dimensional images. This methodology has at least two formulations. The first formulation uses a pair of cameras locked together a fixed distance apart. The second one uses a single camera along with a position or orientation sensor. In the second case, the camera position must be known at all times, and is not derived from the images. This method requires that a human user identify a set number of points along the outline and shape of the same object appearing throughout multiple images. Once this is done, the program reconstructs a three-dimensional wire frame model of the object by calculating the three-dimensional locations of the points that the user selected and then mapping the two-dimensional image textures of the object onto that wire frame. This texture mapping introduces inherent distortions in the image.
Another methodology for creating a three-dimensional model from two-dimensional images is referred to as "optic flow." This methodology is based on the property that due to perspective, when a viewer moves relative to stationary objects, objects closer to the viewer appear to move more in the viewers field of view than objects far away. The method estimates depth (the third dimension) from relative motion of identified objects in the two-dimensional image sequence. This method works because the distance from an object and the object's perceived shape are inherently linked due to perspective. For example, the far side of a building looks smaller than the near side. This method requires that the objects in an image be identified and tracked from frame to frame. It suffers greatly from occlusions and incomplete data because these things make tracking difficult. In addition, it is very sensitive to noise because errors in size or motion measurement are magnified greatly when estimating relative depth. In addition it requires a known camera position.
Another methodology for creating a three-dimensional model from two-dimensional images is known as "shape from motion." This methodology formulates the problem in linear algebra. In most implementations, all of the images in the sequence are used at once in a single calculation that yields the desired output (a closed-form solution). That output is either the shape of the object given the camera motion, or the camera motion given the shape of the objects, or both shape and motion.
In order to find both shape and motion, it is necessary to make them independent of one another. This is a problem since in a perspective projection they are related as described above. Therefore, under this formulation, it becomes necessary to assume an orthographic projection. (This means that all lines of vision are parallel and thus objects do not get smaller with distance.) This has the severe disadvantage that it introduces distortion. For example, this method would assume that the far side of a building in an image is, in fact, smaller than the near side, and thus model the building in three dimensions with one side shorter than the other. There are at least two formulations of this structure from motion methodology: one that uses an iterative method, and one that uses a closed-form method.
In addition there are drawbacks to closed form solutions: They require all of the data and cannot calculate an answer as images are acquired, thus cannot be made into real time solutions. Also, they put all of the input data into a number of large matrices and operate on those to find the solution. As a result, any missing matrix values causes serious problems, including making it unsolvable without filling in those values with guesses, thus introducing large errors. In addition, this method also tracks a relatively small number of points and texture maps onto those thus introducing texture warping.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an improved technique for developing a three-dimensional model from two-dimensional images.