The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A search engine is a computer program that helps a user to locate information. Using a search engine, a user can enter one or more search query terms and obtain a list of resources that contain or are associated with subject matter that matches those search query terms. While search engines may be applied in a variety of contexts, search engines are especially useful for locating resources that are accessible through the Internet. Resources that may be located through a search engine include, for example, files whose content is composed in a page description language such as Hypertext Markup Language (HTML). Such files are typically called pages. One can use a search engine to generate a list of Universal Resource Locators (URLs) and/or HTML links to files, or pages, that are likely to be of interest.
Some search engines order a list of files before presenting the list to a user. To order a list of files, a search engine may assign a rank to each file in the list. When the list is sorted by rank, a file with a relatively higher rank may be placed closer to the head of the list than a file with a relatively lower rank. The user, when presented with the sorted list, sees the most highly ranked files first. To aid the user in his search, a search engine may rank the files according to relevance. Relevance is a measure of how closely the subject matter of the file matches query terms.
To find the most relevant files, search engines typically try to select, from among a plurality of files, files that include many or all of the words that a user entered into a search request. Unfortunately, the files in which a user may be most interested are too often files that do not literally include the words that the user entered into the search request. If the user has misspelled a word in the search request, then the search engine may fail to select files in which the correctly spelled word occurs.
Worse yet, a user may enter, into a search request, a word that is a correctly spelled word, but that is not the word that means the thing for which the user desires to search. For example, a user who wants to find files that include information about “Silicon Valley” may, through ignorance or by accident, request a search for “Silicone Valley”. Because “Silicone” is a correctly spelled word, a spelling checking program will not detect any error. Under such circumstances, the user is likely to obtain a list of results that have little to do with what the user was actually looking for.
A user may successfully enter a search request that includes correctly spelled words that are used in the correct context. Even in this case, a search engine may fail to return many existing files that include information in which the user would be very interested. Search results may be under-inclusive for a variety of reasons. A verb in the search request may be in a different verb tense than the verbs contained in the files. A noun in the search request may be expressed in the plural form while the nouns in the files are expressed in the singular form. A word may have more than one correct spelling, and the spelling used in the files might be different than the spelling that the user selected. The words included in the files may be synonyms of the words that the user entered into the search request. For any of these or other reasons, a search engine may return sub-optimal results.
Typically, 8-10% of queries to Web search engines have at least one query term that is misspelled. Some deceitful website developers (known as “spammers) design their websites to target popular misspellings. By targeting popular misspellings, spammers attract unsuspecting Web users to visit their respective websites, which may result in increased advertisement revenue for the spammers. For example, a relatively common misspelling is “Brittney Spears”, the correct spelling being “Britney Spears”. A website that has nothing to do with celebrities or Hollywood may include “Brittney Spears” in one or more webpages of the website in order for those webpages to appear in search results of queries with that misspelling.
To address the problem of correcting query misspellings, techniques have been developed for suggesting alternative spellings. Such techniques are described, for example, in U.S. patent application Ser. Nos. 10/364,078 and 10/788,970, both entitled SUGGESTING AN ALTERNATIVE TO THE SPELLING OF A SEARCH QUERY, respectively filed on Feb. 10, 2003 and Feb. 27, 2004, both of which are incorporated by reference as if fully set forth herein.
In one approach for correcting query spellings, a spelling correction mechanism: (1) generates, based on a submitted query, multiple query suggestions; and (2) provides the top ranked query suggestion to the user. There are at least two problems with this approach.
First, the top ranked query suggestion may not be associated with a relatively high degree of confidence. Moreover, other query suggestions may have rankings that are similar with, albeit lower than, the ranking of the top query suggestion. Therefore, the actual intention of the user may not be reflected solely in the top ranked query suggestion, thus contributing to the relatively high error rate (i.e., 20-30%) of the above approach.
Second, in attempting to correct perceived misspellings, a spelling correction mechanism may modify queries that do not need modifying. For example, a user may submit “walmark” as the query. Even though “walmark” is the name of a real company, the spelling correction mechanism may erroneously suggest “walmart” to the user because “walmart” may be a more popular search and it is only different than “walmark” by one letter.
What is needed is an improved mechanism for predicting user intent based on a user's query in order to provide useful query suggestions and/or search results.