This specification relates to digital information retrieval, and particularly to processing search results.
The Internet enables access to a wide variety of resources, such as video or audio files, web pages for particular subjects, book articles, or news articles. A search system can identify resources in response to a user query that includes one or more search terms or phrases. The search system ranks the resources based on their relevance to the query and importance and provides search results that link to the identified resources. The search results are typically ordered according to the rank.
The relevance of a resource to a user query can be determined, in part, based on the textual content of the resource or textual content associated with the resource. For example, text included in the content of a resource can be compared to the query to determine whether the resource is relevant to the query. In turn, the resources can be ordered, in part, based on the comparison of the textual content and the query.
While using textual features associated with a resource can provide information by which a search system can determine the relevance of the resource to the query, some resources contain textual content that causes the resource to be improperly identified as relevant to queries. For example, an image of a cake may be associated with a sentence describing the image as “Johnny got a birthday cake” may be identified as a relevant image for a query for “Johnny” even though Johnny does not appear in the image. Thus, search results for images that are selected based solely on textual content associated with the images referenced in the search results may include images that are not relevant to the query.