The first stage in creating 3D models from photographs or other images is to estimate the 3D positions and orientations of the camera or other imaging device used to take the input photographs or other images. Similarly in Augmented Reality (AR) applications, a virtual camera position and orientation is required to overlay 3D graphical elements onto live video. In previous methods such as that used in 3D Software Object Modeller (3DSOM) Pro produced by Creative Dimension Software Ltd a single quite complex and known planar calibration pattern (“mat”) is placed under the object. However for large objects it is not always practical to produce a suitably large calibration mat to place under the object.
Conventional photogrammetry (e.g., Microsoft® PhotoSynth® software) uses image scene structure to automatically estimate all camera parameters (orientation, position, focal length) from a large set of photographs of a scene—typically outdoors. However there are several drawbacks to this approach—it is computationally complex, requires large number of overlapping photos with suitable “natural” features that can be automatically matched. In practice users may wish to model a large object in a less cluttered environment where there are fewer reliable features and using fewer images.
In AR and mobile sensing, techniques exist called Simultaneous Localization and Mapping (SLAM) which is a technique used by robots and autonomous vehicles to build up a map within an unknown environment (without a priori knowledge), or to update a map within a known environment (with a priori knowledge from a given map), while at the same time keeping track of their current location. Visual SLAM (VSLAM) uses the same techniques for video images. These techniques do not require a prior target or map but require considerable processing power and may not be reliable enough for real world applications. In particular video tracking approaches can suffer from accumulation of tracking error as the camera is moved around the scene.
Gilles Simon, Andrew W. Fitzgibbon and Andrew Zisserman published a paper entitled Markerless Tracking using Planar Structures in the Scene (http://www.robots.ox.ac.uk/˜vgg/publications/papers/simon00.pdf), which describes the use of one or more planes for camera tracking. However the approach essentially tracks a single plane at a time with a “hand-off” between tracking one plane and the next. The paper does not address the problem of reliably estimating the relationship between a plurality of planar targets and the targets which are not known a priori (i.e., the positions of features on the target planes is not known in advance) making the process less robust.