1. Field
The present application relates generally to image processing and more particularly to machine-learning-based techniques for image search.
2. Related Art
Searching for images by keyword is known in the art. Internet search engines such as Yahoo®, Google®, and others provide image search features for finding graphical images on Internet web servers. In most existing image search engines, the search query is specified as text, which includes one or more keywords, and the search engine identifies images that are related to the keywords.
Most existing search engines do not analyze actual image data when determining if an image is related to the search keywords, but instead compare the search keywords to metadata keywords, such as tags, previously associated with the image by some other actor such as a human or a camera. A tag, also known as a “label”, may be any word or phrase. Tags are ordinarily considered accurate if they have meanings related to the content of the image as perceived by a human. Conversely, tags that have very little or no relation to the image's content are considered inaccurate. A tag may be the name of a particular object in the image, e.g., “Eiffel Tower”, or may be the name of a category of objects related to an object in the image, e.g., “Landmarks”. Other metadata associated with images may be automatically generated, e.g., by a camera when a picture is taken. A camera may, for example, tag a picture with the time and date at which the picture was taken.
Relying on metadata keywords for image search has a number of limitations. There is no guarantee that any metadata tags are present on an image, or that the tags are accurate, and no way to determine the accuracy level of tags that are present. Even if the tags are accurate, they may be incomplete, e.g., describing the image at a very broad or narrow level, or describing some features of the image but not others. In experiments, more than half of the images with a category tag are found not representative of the object category specified by the tag, either because they do not depict the category or because the depiction is extremely abstract. Even for those images that do show the object category, many are poor examples of the category. In a large collection of images, there are often some quite good representative images of categories or objects. The problem is identifying the images that are good representatives among the “noise” of poor representative images. Category labels are often ambiguous. “Beetle” may refer to a car or an insect, for example. Previous work has proposed clustering (especially on text) as a tool for identifying multiple meanings and separating search results into groups according to different meanings. Clustering may be applied to a set of images to partition the set into subsets that correspond to image features, where each subset includes images that have similar features. However, applying clustering to image appearance directly is often not feasible. The millions of images returned by a search for a particular tag overwhelm many clustering approaches, because feature computation and comparison is more expensive for images than for text. In addition, even when clustering is computationally feasible, because of noise in the labeling, many clusters are meaningless or contain incorrect results.
Many images available online, e.g., on the Flickr® photo sharing site, or on other sites, are not reliably labeled with the name of the object represented in the iconic image. The images may be incorrectly tagged, or may not be true iconic images. For example, the images may be partial views of an object or views of the object surrounded by other objects. As states above, although existing text-based search engines provide image search features, most such search engines do not actually search the content of images, but instead search text data associated with images, such as metadata and web page text. For example, an image search for the term “boat” may return an image that is labeled with the text “boat”, but the image search does not analyze the graphical features of the image itself to determine if the image depicts a boat. Searching for images of a particular object using existing image search engines may yield images that are poor representations of or are unrelated to the search query. It would be desirable, therefore, for image searches to find images that accurately depict the object named in a search query.