This specification relates to search query processing.
The Internet provides access to a wide variety of resources, for example, webpages, image files, audio files, and videos. A search system selects one or more resources in response to receiving a search query that a user submits to satisfy the user's informational needs. The search queries are usually in the form of text, e.g., one or more query terms. The search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources, and provides search results that link to the selected resources. The search results are typically ordered according to the scores and presented according to this order.
The different types are resources are often indexed according to a corpus, and search engines are used to search these corpora. As used herein, a corpus is a collection of resources. Each corpus can include resources of one or more types. For example, a general web corpus can include HTML documents, images documents, videos, etc., while an image corpus, on the other hand, can be limited to a collection of images and metadata for the images. Thus there are different types of corpora that a search engine searches. For example, a search engine searches a general resource corpus index to search resources based on the textual content of the resources and relative authority ratings of the resources. Search results resulting from a search of the general resource corpus index typically include a title of the resource, a snippet of text from the resource, and a link to the resource. Likewise, a search engine searches an image corpus index to search for images that are responsive to a search query. The image corpus index may index the images based on labels (or text) associated with the images, similarity of the images to other images, click through rates of images with respect to queries, and authority ratings of the images. Search results resulting from a search of the image corpus index typically include a thumbnail version of the image, an embedded link to the image, or a web page in which the image is referenced, and optionally label text associated with the image.
Most search engines provide the users options to search a particular corpus. Some search engines, however, provide search results for different corpora if the query is indicative of those different corpora. For example, a search engine may provide image search results with general web search results, even though the query was submitted for a search of the general web corpus. Typically the search results for the other corpus (or corpora, if multiple corpora are searched) are shown in a fixed configuration display area and show a fixed number of search results identifying resources in the other corpora that are responsive to the search query.
One way that a search system decides to search a second corpus during a search of a first corpus is to evaluate an “intent” of a query. For example, a query submitted for a search of a general web corpus may have a high “image intent” (e.g., the query may read “puppy photos”). One way of determining an intent of a query is to analyze the history of the query and for which corpora users submitted the query for searches. However, if a query is unique or occurs relatively infrequently in a search volume of queries (commonly referred to as a “long tail” query), then discerning the intent of the query may be impossible, or subject to a high degree of error.