Field of the Invention
Embodiments of the present invention generally relate to estimating the current pose of a camera as the camera is moved through space.
Description of the Related Art
Many interactive, camera-based applications rely on the estimation of camera pose with respect to a reference coordinate system. A classic example of such an application is augmented reality (AR), in which the estimated camera pose determines the perspective rendering of a virtual object. In general, AR is a live, direct or indirect, view of a physical, real-world environment which is augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics or GPS data in order to enhance the user's perception of reality. The augmentation is conventionally performed in real-time and in semantic context with environmental elements, e.g., sports scores on TV during a sporting event.
In many AR scenarios, there is constant relative motion between the camera and the scene. In order to insert a virtual object such that the object appears geometrically consistent with the scene, the application determines the relative rotation and translation of the camera with respect to the scene, i.e., the camera pose.
Typically, once a starting pose estimate for a camera is computed, instantaneous image measurements are fused with past temporal information to continually update the camera pose. However, factors such as occlusion, motion blur, etc., can lead to noisy image measurements or discontinuities in temporal information that can render this pose update process unreliable or unstable. Under such circumstances, the camera pose estimate may need to be recovered.
There are two common approaches used for initializing and recovering a camera pose estimate. In one approach, the camera pose estimation algorithm has a-priori knowledge of the background scene. In this approach, warped versions of the background scene are generated in an offline phase. Thus, to initialize or recover the camera pose estimate, the algorithm can compare input images against the pre-generated warped images to estimate the pose.
In another approach, pose-invariant feature descriptors are used. In this approach, the features, F, computed from an image are invariant to changes in camera pose. Thus, even as the camera pose changes from the first image I0, to image It at time t, the algorithm can establish sufficient matches between F0 and Ft to recover the camera pose at time t. While use of pose-invariant features is powerful, their use is very computationally intensive, and hence, currently not widely deployed for embedded real-time use.