A challenge to enabling Augmented Reality (AR) on mobile phones or other mobile platforms is the problem of detecting and tracking objects in real-time. Object detection for AR applications has very demanding requirements: it must deliver full six degrees of freedom, give absolute measurements with respect to a given coordinate system, be very robust and run in real-time. Of interest are methods to compute camera position and orientation (pose) using computer vision (CV) based approaches, which rely on first detecting and, subsequently, tracking objects within the camera view. In one aspect, the detection operation includes detecting a set of features contained within the digital image in order for those features to be compared against a database of known features corresponding to real-world objects. A feature may refer to a region in the digital image that differs in properties, such as brightness or color, compared to areas surrounding that region. In one aspect, a feature is a region of a digital image in which some properties are constant or vary within a prescribed range of values.
A feature may be regarded as either blob-like or edge-like, based, in part, on its shape. Blob-like features may be highly localized on an object, thus making the correlating of features easier, whereas edge-like features are not necessarily localized. Some types of feature detection algorithms attempt to filter out features that are deemed too edge-like so as to reduce processing times. For example, the Scale-invariant feature transform (SIFT) algorithm calculates eigenvalues of the Hessian of each feature. Then the ratio of the eigenvalues of each feature is compared against a fixed threshold. If the ratio is higher than the fixed threshold then the feature is deemed too edge-like and the feature is discarded.
However, problems arise when trying to use the same detector to detect both objects that are feature-rich and objects that are not. Objects that are not feature-rich, such as logos, include mostly edge-like features and very few, if any, blob-like features. This is because most logos are man-made and on purpose avoid sharp corners and non-smooth blobs. If the feature detector is “relaxed” to allow in more edge-like features, then a subsequent feature-rich target object may result in more features than processing may reasonably allow. That is, the limited computational capabilities of the mobile phone CPU makes it difficult, if not impossible to detect an object in an image that includes too many features.