The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A search engine is a computer program that helps a user to locate information. Using a search engine, a user can enter one or more search query terms and obtain a list of resources that contain or are associated with subject matter that matches those search query terms. While search engines may be applied in a variety of contexts, search engines are especially useful for locating resources that are accessible through the Internet. Resources that may be located through a search engine include, for example, files whose content is composed in a page description language such as Hypertext Markup Language (HTML). Such files are typically called pages. One can use a search engine to generate a list of Universal Resource Locators (URLs) and/or HTML links to files, or pages, that are likely to be of interest.
Search engines order a list of files before presenting the list to a user. To order a list of files, a search engine may assign a rank to each file in the list. When the list is sorted by rank, a file with a relatively higher rank may be placed closer to the head of the list than a file with a relatively lower rank. The user, when presented with the sorted list, sees the most highly ranked files first. To aid the user in his search, a search engine may rank the files according to relevance. Relevance is a measure of how closely the subject matter of the file matches query terms.
To find the most relevant files, search engines typically try to select, from among a plurality of files, files that include many or all of the words that a user entered into a search request. Unfortunately, the files in which a user may be most interested are too often files that do not exactly match the words that the user entered into the search request. If the user enters the singular form of a word in the search request, then the search engine may fail to select files in which the plural form of the word occurs. The reverse can occur as well and a user enters the plural form a word in a search and the search engine fails to select files in which the singular form occurs. For example, the word “shoe” is different from the word “shoes.” Thus, entering the search term “shoes” would preclude all web documents that contain “shoe.” As a result, the search engine may return sub-optimal results for the particular query.
Up to 50% of queries directed to web search engines possess at least one term in the search query that may be transformed either from singular to plural form or plural to singular form. However, among these 50% of queries, only 25% would benefit from pluralization or de-pluralization. Thus, a substantial number of pluralization or depluralization is not useful and should be avoided. In addition, for a good user experience when using search engines, the user will require that search engine perform searches of their queries quickly and with the most relevant results. Thus, there is a clear need for techniques to determine when and how to convert words in a query to its plural or non-plural form in order to provide the most relevant search results while minimizing computational overhead associated with the search.