This disclosure relates to recovering 3D structure and/or unknown camera motions.
In computer vision, structure from motion refers to a process of estimating camera motion and 3D structure by exploring the motion in a 2D image plane caused by the moving camera. The theory that underpins such a process is that a feature in the 2D image plane seen at a particular point by the camera actually lies along a particular ray beginning at the camera and extending out to infinity. When the same feature is seen in two different images, the camera motion with respect to that feature can be resolved. Using this process, any point seen in at least two images may also be located in 3D using triangulation.
However, conventional feature-based camera motion estimation algorithms typically require at least one identifiable feature to exist in two images so that the feature can be tracked in the images. This is limited in that geometry information of a fixed feature needs to be known for those algorithms to work well. Some of those conventional algorithms also require the camera's information be known, such as aspect ratio or field of view.
For example, Blender® is a tool that can be used to estimate camera motion and reconstruct a scene in 3D virtual space. Specifically, Blender® can let the user or automatically specify one or more tracking points for certain identifiable features in a series of images extracted from a video footage by marking those points in the video footage. The positions of these features are then tracked throughout the images. A user can obtain camera motion for those images by providing the tracked positions of these features in the images to a solver provided by Blender®. Through the solver, Blender® can then compute the camera motion using the positions of these features in the images. The underlying theory of the solver is that appearance of any of these features in two adjacent image frame(s) can indicate a motion of the camera.
However, to capture a scene, a director may shoot extreme close-ups with very little image surrounding a subject. For example, an extreme close-up of a portion of a room can leave very little objects or features in the close-up to be tracked. Thus, calculating the camera motion using the conventional feature-based camera motion estimation algorithms, such as that employed by Blender®, can be difficult for an extreme close-up scene. In the aforementioned example, all that is left in the background may be an edge of a window, a top of a wall, or a corner of the room. In that example, there is not enough geometry information that can be used by the conventional feature-based camera motion estimation algorithms to track a feature to compute a single pattern in the images.