The specification relates to digital information processing, and particularly to processing image search data.
The Internet provides access to a wide variety of resources, for example, video files, image files, audio files, or Web pages including content for particular subjects, book articles, or news articles. A search system can select one or more resources in response to receiving a search query. A search query is data that a user submits to a search engine to satisfy the user's informational needs. The search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources to provide search results that link to the selected resources. The search results are typically ordered according to the scores.
A very popular search operation is image searching. A search engine can use search queries to find images. The search queries can be in the form of text, e.g., one or more terms or phrases, or images, e.g., an image file. For a search query that is text, the relevance of an image to the search query can be determined based on text associated with a resource (e.g., web page) in which the image is embedded. Text associated with the resource is compared to the search query to determine measures of relevance of the image relative to the search query. For example, an image of a coffee cup, stored in a file named “coffee cup.jpg”, may be associated with a textual caption “coffee mug” that is rendered below the image, and also associated with the “coffee cup” text of the file name. For a search query that is an image, the relevance of an image to the search query can be determined based on image features values that are derived from the search query image and the image being evaluated.
The identification of similar queries can be used to facilitate one or more search operations. For example, the identification of similar queries can be used to provide query suggestions and/or to identify additional resources. Search queries, however, whether in the form of text or images, are often an incomplete expression of the information needed, and thus it is difficult to determine if two queries are similar based on their semantic content or image content. Additionally, processing requirements for search engines that store billions of queries in query logs can be very large. Finally, determining similarity of search queries is further complicated for search queries of different types, e.g., text in different languages, or a search query that is text and another search query that is an image.