The invention relates to an image processing apparatus and, more particularly, the invention relates to method and apparatus for performing three dimensional scene estimation of camera pose and scene geometry, and providing a method for the authentic insertion of a synthetic object into a real scene using the information provided by the scene estimation routine.
Seamless three dimensional insertion of synthetic objects into real scene images requires tools to allow a user to situate synthetic objects with respect to real surfaces within a scene. To facilitate the creation of a realistic image, the synthetic objects need to be projected from all the given camera viewpoints of the real scene. The current methodology for inserting a synthetic object into a real scene includes the cumbersome task of tracking and recording the camera pose and calibration for each frame of a sequence. Thus the geometry and orientation of the synthetic object to be inserted can be matched to the camera pose and calibration data for each individual frame. This process of matching geometry and orientation of the synthetic image to the individual frame pose and calibration data is repeated frame to frame in order to maintain the realistic view of the inserted object through a sequence of frames. In current practice, the pose estimation is accomplished by modeling the three dimensional background scene prior to the pose computation. This is a tedious process.
In order to automate the insertion process, it is required that object insertion be performed in as few frames as possible, preferably one, and all the other views of the object be created automatically. For placement of the object with respect to the real scene, accurate albeit limited three dimensional geometry is required, for instance, estimation of local surface patches may suffice. For stable three dimensional appearance change of the object from the given camera positions, a reliable three dimensional camera pose computation is required. Furthermore, since the graphics objects are typically created using Euclidean geometry, it is strongly desirable that the real scene and the camera pose associated with the real scene be represented using Euclidean coordinates. Stability of the pose computation over extended image sequences is required to avoid jitter and drift in the location and appearance of synthetic objects with respect to the real scene.
Therefore, a need exists in the art for a method and apparatus for estimating three dimensional pose (rotation and translation) and three dimensional structure of unmodeled scenes to facilitate the authentic insertion of synthetic objects into a real scene view.
The present invention provides an apparatus and method of estimating pose and scene structure in extended scene sequences while allowing for the insertion and authentic projection of three dimensional synthetic objects into real views. Generally, given a video sequence of N frames, the invention computes the camera pose (the rotation and translation) without knowledge of a three dimensional model representing the scene. The inventive apparatus executes a multi-view three dimensional pose and structure estimation routine comprising the steps of feature tracking, pairwise camera pose estimation, computing camera pose for overlapping sequences and performing a global block adjustment that provides camera pose and scene geometric information for each frame of a video sequence. The pairwise camera pose estimation may alternately be selected between xe2x80x9ckeyxe2x80x9d frames rather than every frame of the sequence. The xe2x80x9ckeyxe2x80x9d frames are selected from within a sequence of frames, where the xe2x80x9ckeyxe2x80x9d frames are: frames with sufficient parallax motion between the frames; frames that transition between overlapping sets of correspondences; or frames that are regularly sampled if motion within the frame sequence is smooth. A xe2x80x9cMatch Movexe2x80x9d routine may be used to insert a synthetic object into one frame of a video sequence based on the pose and geometric information of the frame, and calculate all other required object views of the synthetic object for the remaining frames using the pose and geometric information acquired as a result of the multi-view three dimensional estimation routine. As such, the synthetic object is inserted into the scene and appears as a xe2x80x9crealxe2x80x9d object within the imaged scene.