A challenge to enabling Augmented Reality (AR) on mobile phones or other mobile platforms is the problem of detecting and tracking objects in real-time. Object detection for AR applications has very demanding requirements: it must deliver full six degrees of freedom, give absolute measurements with respect to a given coordinate system, be very robust and run in real-time. Of interest are methods to compute camera pose using computer vision (CV) based approaches, which rely on first detecting and, subsequently, tracking objects within the camera view. In one aspect, the detection operation includes detecting a set of features contained within the digital image. A feature may refer to a region in the digital image that differs in properties, such as brightness or color, compared to areas surrounding that region. In one aspect, a feature is a region of a digital image in which some properties are constant or vary within a prescribed range of values.
The detected features are then compared to known features contained in a feature database in order to determine whether a real-world object is present in the image. Thus, an important element in the operation of a vision-based AR system is the composition of the feature database. In some systems, the feature database is built pre-runtime by taking multiple sample images of known target objects from a variety of known viewpoints. Features are then extracted from these sample images and added to the feature database.
Recently, augmented reality systems have turned to model-based tracking algorithms or Simultaneous Localization And Mapping (SLAM) algorithms that are based on color or grayscale image data captured by a camera. SLAM algorithms reconstruct three-dimensional (3D) points from incoming image sequences captured by a camera which are used to build a 3D map of a scene (i.e., a SLAM map) in real-time. From the reconstructed map, it is possible to localize a camera's 6DoF (Degree of Freedom) pose in a current image frame.
In some systems SLAM maps of a target object are generated pre-runtime and in close distance from the object. In runtime, the pre-runtime generated SLAM maps of the object are used to estimate 6DoF pose of the camera, relative to the object, from incoming video frames. When SLAM maps built only from the target object are used, tracking of the target object becomes relatively unstable as the distance between the camera and the object increases. This is because there are large scale changes of the imaged object and these scale changes in images causes failure in tracking of points on the object surface because the feature descriptors extracted under such a scale and lighting conditions are quite different from those stored in its previously generated SLAM maps.
Tracking of the target object may also become unstable because of physical changes in the target object that have occurred after the previously SLAM maps were built. Physical changes in the target object may cause descriptor changes of a 3D point during runtime and make it further difficult to detect and/or track the target object.