A wide range of electronic devices, including mobile wireless communication devices, personal digital assistants (PDAs), laptop computers, desktop computers, digital cameras, digital recording devices, and the like, employ computer vision techniques to provide versatile imaging capabilities. These capabilities may include functions that assist users in recognizing landmarks, identifying friends and/or strangers, and a variety of other tasks.
A challenge to enabling Augmented Reality (AR) on mobile phones or other mobile platforms is the problem of estimating camera poses in real-time. Pose estimation for AR applications has very demanding requirements: it must deliver full six degrees of freedom, give absolute measurements with respect to a given coordinate system, be very robust and run in real-time. Of interest are methods to compute camera pose using computer vision (CV) based approaches.
Some conventional augmented reality systems attempt to determine pose by performing visual motion estimation by tracking the motion of features extracted from image pixel data. There are, however, a number of issues, such as computational complexity, that make it difficult to implement known visual-motion techniques on a mobile platform.
Visual motion estimation may be described as determining how a camera or a set of points has moved from one position to another. A key component typically present in conventional visual-motion methods is how to obtain matches between features extracted from successive or subsequent images. However, feature matching is a computationally expensive and error-prone procedure, often addressed through complex feature descriptors (e.g., SIFT or SURF) combined with RANSAC. Such conventional methods typically samples (i.e., tries) many possible feature matches until, if lucky, a sufficiently good set of matches is found.