This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Image content recognition and retrieval from a database may be a desired property in certain situations. For example, a mobile device can be used to take pictures of products, objects, buildings, etc. and then the content of the image may be determined. Possibly, pictures with similar content may be searched from a database. To do this, some content recognition is performed.
This may also be applicable other devises as well, such as set-top-boxes and other computing devices.
For any object in an image there may be many features, interesting points on the object. These interesting points can be extracted to provide a feature description of the object which may be used when attempting to locate the object in an image possibly containing many other objects. For image feature generation some approaches take an image and transforms it into a large collection of local feature vectors. Each of these feature vectors may be invariant to scaling, rotation or translation of the image.
Image content description is used in a wide range of applications, including hand-held product recognition, museum guides, pedestrian navigation, set top-box video content detection, web-scale image search, and augmented reality. Many such applications are constrained by the computational power of their platforms. Even in unconstrained cases, such as web-scale image search, processing millions of images can lead to a computational bottleneck. Therefore, algorithms with low computational complexity are always desirable. Augmented reality applications may further be constrained because resources of mobile devices are shared between camera pose tracking and image content recognition. These two tasks may usually be decoupled from each other. Technologies that are fast enough for real-time tracking may not perform well at recognition from large-scale databases. Conversely, algorithms which perform well at recognition may not be fast enough for real-time tracking on mobile devices.
In addition to compatibility, a compact descriptor for visual search algorithm should be small and efficient to compute in hardware or software. Smaller descriptors may more efficiently use memory and storage, and may be faster to transmit over a network and retrieving from a database. Low-complexity descriptors may enable applications on low-power mobile devices, as well as extending the capabilities of large-scale database processing.
Mobile augmented reality systems overlay virtual content on a live video stream of real-world content. These systems rely on content recognition and tracking to generate this overlay.
To perform well on large scale retrieval tasks, interest points (aka features) that can be localized in both location and scale may be helpful. Interest points such as corners, edges etc. can be searched from an image using different algorithms such as Accelerated Segment Test. One image can include a huge number of interest points depending on the contents of the image. Some images may include dozens of interest points whereas some other images may include hundreds of or even thousands of interest points. Moreover, images can be scaled to provide different scales of the image. Then, interesting point detectors may use pixels from different scales to determine whether there exists an interest point near a current pixel.
Though Features from Accelerated Segment Test (FAST) corners can be detected at different scales, they are inherently insensitive to scale changes. Also, replicating them at many scales may create an excessively large database and unwanted redundancy. Conversely, blob detectors such as Laplacian of Gaussian (LoG), Difference of Gaussians (DoG), Determinant of Hessian (DoH), and Difference of Boxes (DoB) are all sensitive to scale variation and can thus be localized in scale space.