The present invention relates to image retrieval.
As the well-known saying goes, “A picture is worth a thousand words”, images generally convey large amount of information. This leads to one fundamental challenge to content-based image retrieval: the retrieval algorithms have no clue which subset of the “thousand words” in a query that a user is searching for. For instance, the query in FIG. 4 shows a rocky coast, then is the user searching for the exact location, rocks of similar shapes, or any coast scene?
Both large-scale object recognition and near-duplicate image retrieval achieve significant advance in recent years, yet remain independent efforts due to different focuses on recognition accuracy and retrieval scalability. Conventional recognition approaches generally require substantial computation, which are hardly affordable in online image retrieval.
Large-scale object recognition and near-duplicate image retrieval largely remain independent efforts due to different focuses on recognition accuracy and retrieval scalability. Previous work on image retrieval either uses the local invariant features only or the semantic attribute only.
Local invariant image features are robust to delineate low-level image contents and capable of finding near-duplicate images in the database, i.e., images including the same object or scene but undergoing some lighting, scaling, and view angle changes. In contrast, classification scores by large-scale object recognition may reveal their semantic meanings, but the requirement of heavy computations makes it hard to be applied to online image retrieval.