This specification relates to identifying similar images.
Search engines aim to identify resources (e.g., images, audio, video, web pages, text, documents) that are relevant to a user's needs and to present information about the resources in a manner that is most useful to the user. Search engines return a set of search results in response to a user submitted text query. For example, in response to an image search text query (i.e., a query to identify image resources), the search engine returns a set of search results identifying image resources responsive to the query (e.g., as a group of thumbnail representations of the image resources).
However, in some cases users may want to enter a query that is not textual. For example, a user that has an image may wish to search for similar or related images. Additionally, a user can be interested in refining the results of a previous image search to identify images similar to an image in the presented search results.
Some conventional techniques for learning image similarity rely on human raters that determine the relative similarity of image pairs. For example, a human rater can be presented with several object pairs and asked to select the pair that is most similar. Relative similarity can also be identified using common labels associated with images or that are provided in response to a common query.
Learning semantic similarity for images becomes difficult as the number of images increases. For example, learning pairwise similarity for image set including billions of images results in a quadratic number of pairs to compute, which is typically time and resource prohibitive using conventional techniques.