An abundance of information is available via the Internet. Users can direct web browser applications, such as Mozilla Firefox, to various Uniform Resource Locators (URLs) in order to view content that is associated with those URLs. In order to assist users in locating certain kinds of content for which the users do not know the associated URLs, various Internet search engines have emerged. Yahoo! Inc. is the owner and operator of one of these Internet search engines.
A user can enter a set of query terms into an Internet search engine's user interface. The Internet search engine receives the query terms and searches an index for known content items that are associated with the query terms. The Internet search engine creates a list of content items that are relevant to the submitted query terms. The Internet search engine returns the list to the user.
When the first Internet search engines emerged, the results returned by the search engines were hyperlinks to web pages that those search engines deemed to be relevant to the user-supplied query terms. Typically, a web page that contained the query terms was deemed to be relevant. Along with those hyperlinks, some search engines returned textual abstracts or blurbs that gave the query term-submitting user a hint as to the content of the page, and the context in which the query terms were used in the page.
Later, more advanced Internet search engines allowed a user to indicate that the search was to be performed specifically relative to images that were available on the Internet. These search engines indexed the images that they found (via web crawling) on the Internet. A user who wanted to search for images pertaining to a certain phrase could supply that phrase as a set of query terms to the search engine, and instruct the search engine to return images. In other words, the user would instruct the search engine to search the “image vertical”—a subset of the entire Internet—rather than the Internet generally. Alternatively, the image search engine might be part of an application with the purpose of organizing and sharing images. One example of an application that includes an image search engine is Flickr™, which is owned by Yahoo! Inc.
The task of an image search engine is to determine which of the available images are most relevant to the user-submitted query terms. Search engines use metadata associated with the images to identify image content and to evaluate the relevance of an image to the user-submitted query terms. Metadata associated with images can be inherent to the image, and metadata associated with an image can also be provided by users to explain the content of the image. For example, FIG. 1 shows an image of a mountain 100. However, if the image 100 is not tagged as a “mountain,” or associated with the metadata “mountain” in any way, then a search engine that relies on metadata to gauge the content of image 100 will not associate image 100 with a mountain.
Therefore, images that are explicitly associated with metadata describing the image's content allow a search engine to properly associate the image with a relevant search query. One method of associating images with metadata is to request that users tag images with words that the users deem to be relevant to the images. However, if users are careless with the tags they choose for a particular image or if users purposefully associate an image with erroneous tags, search engines will not associate the image with its true content, but with the faulty metadata provided by the users. For example, if the image 100 in FIG. 1 is a photograph taken by a user on her way to her favorite theme park, she may tag the image as a “theme park.” This tag would lead a search engine to erroneously provide this picture 100 of a mountain in response to a user-provided search query for “theme parks.” Another reason images are incorrectly tagged is to boost the apparent relevance of the image in a greater number of search results. If the image 100 of a mountain in FIG. 1 is associated with the tags “mountain,” “snow,” and “grass,” the image will be displayed in the search results for queries associated with “mountain,” “snow,” and “grass.” However, if a user wished more people to see the image, the user might tag the image with popular search query terms such as “Brad Pitt,” “Barack Obama,” or “Britney Spears”—though the image is not truly relevant to any of those terms—which would cause the image to be displayed to users in response to a wider variety of search queries.
Another method of relating images with metadata is to glean the information from the context of an image on a web page. Information about an image can be taken from the title or content of the web page or the HyperText Markup Language (HTML) tags around the image in an HTML web page. For example, if image 100 is found by the search engine on a blog about mountains, the search engine might associate image 100 with “mountains.” However, many times the context of an image on a web page can be misleading as to the true content of the image. For example, if image 100 is found on a web page about theme parks, the search engine might erroneously associate the image with “theme parks.”
Furthermore, it is advantageous to order the results of a search query by relevance to the search query because the most relevant information is then presented to the client at the beginning of the search results list. Thus, it is not only important to know that an image is relevant to a search query, but it is also important to know to what degree an image is relevant to the search query. One method of determining whether an image is of high relevance to the metadata associated with the image is to gauge the quality of the image based on its source or the fact that the image was explicitly tagged by a user. However, this method can create inaccurate relevancy determinations because of the lack of strong correlation between source of the image and the true relevance of the image to its associated metadata. Furthermore, as discussed above, an image that is explicitly tagged still may not be highly relevant to the metadata with which the image was tagged.
Another method of determining the relevancy of an image involves measuring the freshness of the image, or how much time has passed since the image was uploaded. If an image is only an hour old, it may be regarded as more freshly associated to its tagged metadata than an image that is a week old. Again, the problem with this method is that the metadata associated with the image may not be accurate to begin with. Yet another method of determining the relevancy of an image to its tagged metadata is how often users click on the image when the image is presented to the users in a search results set for a particular search query. If an image is clicked on frequently when presented in the results list for a particular search query, the image is considered to be highly relevant to the topic of the search query. However, this method does not take into account the possibility that the image may be clicked on frequently because it holds an interest for the users outside of the image's relevance to the topic of the search query.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.