This specification relates to search engine query processing systems.
The Internet provides access to a wide variety of resources, for example, video files, image files, audio files, book-related resources, web pages, etc. The resources are generally hosted on servers or server systems, which are computers that provide access to video and other resources over the Internet. The resources are accessed through uniform resource identifiers (URIs) such as uniform resource locators (URLs).
A search system crawls the Internet and indexes the resources in an index (or a set of indexes) for use in searching. The search system scores resources based on their relevance to a search query and on their importance relative to other resources. The search system provides search results that link to the resources, and the search results are typically ordered according to the scores.
As described above, there are different types of corpora, such as video, image, general web pages, books, products, and the like. A search engine can search the various corpora using different search algorithms, each algorithm designed for a specific corpus. Often users will provide a clear indication of the corpus or type of information from which they need information. Such indications may be from queries that provide a clear expression of the user's informational needs, or from providing the query on a search engine property specific to the corpus. An example of the former is a query [images of Empire State Building], which includes the unigram “images” and an entity “Empire State Building.” An example of the latter is a search engine interface for books to search a book corpus.
However, often users provide queries with terms that are semantically irrelevant or of little relevance to a particular search. For such queries, the corpus or corpora that may have information the user may find useful is not readily identifiable to a search engine.