This specification relates to digital information retrieval, and particularly to processing search results.
The Internet provides access to a wide variety of resources such as video or audio files, web pages for particular subjects, book articles, or news articles. A search system can identify resources in response to a text query that includes one or more search terms or phrases. The search system ranks the resources based on their relevance to the query and on measures of quality of the resources and provides search results that link to the identified resources. The search results are typically ordered for viewing according to the rank.
To search image resources, a search system can determine the relevance of an image to a text query based on the textual content of the resource in which the image is located and also based on relevance feedback associated with the image. For example, an information retrieval score measuring the relevance of a text query to the content of a web page can be combined with a click through rate of an image presented on that web page to generate an overall search result score for the image.
Textual content associated with an image can often be a reliable indicator of a topic and/or subject matter to which the image is related. However, it is possible that images unrelated to the query may be identified in search results responsive to the query if the textual content mischaracterizes the content of the image or is otherwise unrelated to the image. Therefore, images may be identified in response to text queries that are unrelated to the topic specified by the text query.
Some search systems search image resources by using “query images” as input. A query image is an image, such as a jpeg file, that is used by a search engine as input to a search processing operation. Related images can be found by processing other images and identifying images that are similar in visual appearance to the query image. However, viewers interpret images in a much more subjective manner than text. Thus, while the images that are identified may be similar in appearance to the query image, many of the images may not be of interest to the viewer.