The present system relates to systems and methods for tracking planar shapes for augmented-reality (AR) applications.
Augmented reality is a technology in which a user's perception of the real world is enhanced by rendering information generated using computerized virtual content, or a virtual scene, on top of the real world. The virtual content may include labels, 3D models, shading, and illumination. In order for the view of the real world and the virtual scene to align properly (i.e. to be properly registered), the pose (i.e. 3D position and orientation) and other properties of the real and virtual cameras must be the same.
Estimating the pose of a camera relative to the real world, or objects therein, is a task of an AR system. It should be noted that the virtual-reality (VR) and AR research communities often use the term “tracking” to describe a concept different from the computer-vision community. While tracking in VR and AR may generally refer to determining the pose of a camera and/or user relative to the world, tracking in computer vision may refer to data association (also called matching or correspondence) between different visual entities in consecutive frames of an image sequence.
Many different AR tracking methods and systems are available nowadays, including mechanical, magnetic, ultrasonic, inertial, and vision-based, as well as hybrid methods and systems, which combine the advantages of two or more technologies. The availability of powerful processors and fast frame-grabbers has made vision-based tracking methods desirable for various purposes due to their accuracy, flexibility, and ease of use.
Fiducial-based vision-based tracking is popular in AR applications due to the simplicity and robustness that such tracking offers. In the prior art, fiducials are physical objects of predefined shape (and possibly size), and are usually integrated with an identification mechanism for uniquely recognizing individual fiducials. Fiducials are placed in a scene and the camera position is calculated according to their locations in the images.
Since fiducials can be held and manipulated by a user in front of a camera, or mounted to different physical objects to be tracked, fiducials have become very useful for producing tangible interaction techniques, which in turn make better user interfaces. However, the obtrusive and monotonous appearance of predefined shaped fiducials often renders such fiducials unattractive for use in AR applications, since such fiducials require the application developer to “engineer the scene.” This means that the application developer must somehow design a scene in which a obtrusive or monotonous fiducial is present, so that tracking may be accomplished using the fiducial.
In response Natural-Feature Tracking (NFT) methods are becoming more common. NFT methods rely on certain features found in the real world. However, the natural features that can be used should have some easily identified and somewhat unique characteristics. Thus, NFT methods limits tracking to highly-textured objects or environments in which prominent scene features can be robustly and quickly located in each frame. NFT methods usually exhibit increased computational complexity compared with fiducial-based methods, as well as reduced accuracy, since little is assumed about the environment to be tracked. NFT methods are less obtrusive and can provide more natural experiences. Nevertheless, such methods are difficult to use for creating natural user-interfaces.
Furthermore, in the prior art, recognition of general planar shapes (without any specific relation to AR) has been addressed from various directions. One of the approaches is based on the concept known in the computer-vision community as “geometric projective invariance.”
Planar shapes have also been used for tracking in the prior art. Ruiz et al. (hereinafter referred to as Ruiz 2006) (Alberto Ruiz, Pedro E. López de Teruel and Lorenzo Fernández., “Robust Homography Estimation from Planar Contours Based on Convexity”, European Conference on Computer Vision, pp. 107-120, 2006.) proposed a projective approach for estimating the 3D pose of shape contours. An invariant-based frame construction is used for extracting projective invariant features from an imaged contour. The features are used for constructing a linear system of equations in homogeneous coordinates that yields the camera pose. Although theoretically general, the construction proposed in Ruiz 2006 limits the scope of usable shapes by several assumptions on shape concavities, and limits the use of the method in AR applications. In addition, only sparse features are used in Ruiz 2006 for pose estimation, with no error minimization step for increasing the accuracy of the pose estimated.
Iterative optimization has been shown to be useful for tracking, as well as for refining given pose estimates. Fitzgibbon (hereinafter referred to as Fitzgibbon 2001) (Andrew W. Fitzgibbon., “Robust registration of 2D and 3D point sets”, In Proc. British Machine Vision Conference, volume II, pp. 411-420, 2001) proposed a 2D registration method for point sets based on the Levenberg-Marquardt nonlinear optimizer. As pointed out in Fitzgibbon 2001, direct nonlinear optimization on point sets can be easily extended to incorporate a robust estimator, such as a Huber kernel, which leads to more robust tracking. Such a method can also account for curves as sets of points, although the method makes no use of the connectivity information offered by such curves.
A shape footprint, originally proposed by Lamdan et al. (hereinafter referred to as Lamdan 1988) (Lamdan, Y., Schwartz, J. T., and Wolfson, H. J., “Object Recognition by Affine Invariant Matching”, Computer Vision and Pattern Recognition., pp. 335-344, 1988.)), is a construction that can be used for calculating a signature for a shape. Shape footprints have been proposed for the recognition of flat and rigid objects undergoing affine transformations.
Therefore, there is a need for tracking methods that are unobtrusive for various AR applications, while waiving the need to engineer the scene, and still maintaining the high levels of accuracy and robustness offered by fiducial-based tracking methods, as well as the user interaction opportunities inherent to fiducials.