An augmented reality system can insert virtual objects in a user's view of the real world. One key requirement of a successful augmented reality system is a tracking system which can estimate the user's pose accurately relative to a reference, such as a 3D model, etc. This allows the virtual augmentation to be tightly registered to the real-world environment.
Tracking systems for augmented reality need to acquire a reference, which may be a 3D model of the environment, artificial markers placed in the environment or the front view image of a planar surface in the environment. However, it is not always convenient or possible to obtain the reference before performing augmented reality. The dependency on the prior knowledge of the environment greatly limits the usage of augmented reality technology. Thus, it is desirable to generate a reference of an environment on the fly.
An example of a known tracking technology is described by George Klein and David Murray, “Parallel Tracking and Mapping on a Camera Phone”, 8th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 83-86, 19-22 Oct. 2009 (“PTAM”), which does not need prior knowledge of the environment. The PTAM method initializes a reference patch by detecting a planar surface in the environment. This method requires that the surface is detected in two images, and the homography between the two images is computed and is used to estimate 3D location for the points detected on the surface. Thus, the PTAM method requires two images to generate the reference patch while the present invention requires only one. Another example of tracking technology, sometimes referred to as a point-and-shoot method, is described in W. Lee, Y. Park, V. Lepeti, W. Woo, “Point-and-Shoot for Ubiquitous Tagging on Mobile Phones”, 2010 9th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 57-64, 13-16 Oct. 2010, in which the camera orientation is estimated by accelerometers. An image is warped to the frontal view and a set of “mean patches” are generated. Each mean patch is computed as the average of the patches over a limited range of viewpoints, and the ranges over all the mean patches cover all possible views. The point-and-shoot method, thus, relies on sensors to generate reference patch. Moreover, the point-and-shoot method requires the planar object on a vertical or horizontal position. Another method, such as that used by ARTookKit tracks pre-generated high-contrast squares that are printed on the surface of the environment to be tracked. Thus, improvements are desirable.