The analysis of three dimensional scenes from image sequences has a number of goals. These goals include, but are not limited to: (i) the recovery of 3D scene structure, (ii) the detection of moving objects in the presence of camera induced motion, and (iii) the synthesis of new camera views based on a given set of views.
The traditional approach to these types of problems has been to first recover the epipolar geometry between pairs of frames and then apply that information to achieve the above-mentioned goals. However, this approach is plagued with the difficulties associated with the recovery of the epipolar geometry.
Recent approaches to 3D scene analysis have attempted to overcome some of the difficulties in recovering the epipolar geometry by decomposing the motion into a combination of a planar homography and residual parallax. The residual parallax motion depends on the projective structure, and the translation between the camera origins. While these methods remove some of the ambiguities in estimating camera rotation, they still require the explicit estimation of the epipole itself, which can be difficult under many circumstances. In particular, epipole estimation is ill-conditioned when the epipole lies significantly away from the center of the image and the parallax motion vectors are nearly parallel to each other. Also, when there are only a small number of parallax vectors and the scene contains moving objects, these objects incorrectly influence the estimation of the epipole.
In general, the treatment of multipoint geometry assumes that the scene is static and relies on the fact that almost all points selected for the shape estimation are known to belong to a single rigid body. In its current form, this class of methods has drawbacks, for example, these methods do not address the problem of shape recovery in dynamic scenes, in particular when the amount of image motion due to independent moving object is not negligible.