A search engine is a computer program that helps a user to locate information. Using a search engine, a user can enter one or more search query terms and obtain a list of resources that contain or are associated with subject matter that matches those search query terms. While search engines may be applied in a variety of contexts, search engines are especially useful for locating resources that are accessible through the Internet. Resources that may be located through a search engine include, for example, files whose content is composed in a page description language such as Hypertext Markup Language (HTML). Such files are typically called pages. One can use a search engine to generate a list of Universal Resource Locators (URLs) and/or HTML links to files, or pages, that are likely to be of interest.
Some search engines order a list of files before presenting the list to a user. To order a list of files, a search engine may assign a rank to each file in the list. When the list is sorted by rank, a file with a relatively higher rank may be placed closer to the head of the list than a file with a relatively lower rank. The user, when presented with the sorted list, sees the most highly ranked files first. To aid the user in his search, a search engine may rank the files according to relevance. Relevance is a measure of how closely the subject matter of the file matches query terms.
To find the most relevant files, search engines typically try to select, from among a plurality of files, files that include many or all of the words that a user entered into a search request. Unfortunately, the files in which a user may be most interested are too often files that do not literally include the words that the user entered into the search request. If the user has misspelled a word in the search request, then the search engine may fail to select files in which the correctly spelled word occurs.
Worse yet, a user may enter, into a search request, a word that is a correctly spelled word, but that is not the word that means the thing for which the user desires to search. For example, a user who wants to find files that include information about “Silicon Valley” may, through ignorance or by accident, request a search for “Silicone Valley”. Because “Silicone” is a correctly spelled word, a spelling checking program will not detect any error. Under such circumstances, the user is likely to obtain a list of results that have little to do with what the user was actually looking for.
A user may successfully enter a search request that includes correctly spelled words that are used in the correct context. Even in this case, a search engine may fail to return many existing files that include information in which the user would be very interested. Search results may be under-inclusive for a variety of reasons. A verb in the search request may be in a different verb tense than the verbs contained in the files. A noun in the search request may be expressed in the plural form while the nouns in the files are expressed in the singular form. A word may have more than one correct spelling, and the spelling used in the files might be different than the spelling that the user selected. The words included in the files may be synonyms of the words that the user entered into the search request. For any of these or other reasons, a search engine may return sub-optimal results.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.