Vision based tracking techniques use images captured by a mobile platform to determine the position and orientation (pose) of the mobile platform with respect to an object in the environment. Tracking is useful for many applications such as navigation and augmented reality, in which virtual objects are inserted into a user's view of the real world.
One type of vision based tracking initializes a reference patch by detecting a planar surface in the environment. The surface is typically detected using multiple images of the surface the homography between the two images is computed and used to estimate 3D locations for the points detected on the surface. Any two camera images of the same planar surface are related by a 3×3 homography matrix h. The homography h can be decomposed into rotation R and translation t between two images. The pose information [R|t] may then be used for navigation, augmented reality or other such applications.
However, in most cases, the decomposition of homography h yields multiple possible solutions. Only one of these solutions, however, represents the actual planar surface. Thus, there is an ambiguity in the decomposition of homography h that must be resolved. Known methods of resolving homography decomposition ambiguity require the use of extra information to select the correct solution, such as additional images or prior knowledge of the planar surface.
By way of example, tracking technologies such as that described by Georg Klein and David Murray, “Parallel Tracking and Mapping on a Camera Phone”, In Proc. International Symposium on Mixed and Augmented Reality (ISMAR), 4 pages, 2009 (“PTAM”), suffers from the ambiguity in the pose selection after homography decomposition. PTAM requires additional video frames, i.e., images, to resolve the ambiguity. For each possible solution, PTAM computes the 3D camera pose and compares the pose reprojection error for a number of subsequent frames. When the average projection error for one solution is greater than another, such as two times greater, the solution with the greater error is eliminated. Using pose reprojection to resolve the ambiguity, however, takes a long time to converge and sometimes yields incorrect results.
Another approach used to resolve the ambiguity is to choose the homography solution with normal closest to the initial orientation of the camera. This approach, however, restricts the user to always begin close to a head-on orientation and move camera away from that position.
In an approach described by D. Santosh Kumar and C. V. Jawahar, “Robust Homography-Based Control for Camera Positioning in Piecewise Planar Environments”, Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 906-918 (2006), another planar surface in space is required or prior knowledge about the plane is needed to select the correct solution. Thus, this approach has limited practical application.