Search engines are software architectures that allow humans to search and find documents in large collections. Specific examples of search engines include Web search engines (which search documents in the WWW), corporate search engines (which search documents within corporate collections), and email search clients (where an email or contact information for an individual or organization may be considered a document).
In order to find relevant documents, a search engine represents documents and search queries by a number of relevance features which are combined into a document relevance score. When a user types in a query string, the search engine pre-selects a number of candidate documents from a document index and then compares all these documents to the query to determine their relevance. In order to carry out this comparison, a number of relevance features are extracted for each document and are then combined by a ranking function to produce a single relevance score. Subsequently, the search engine may order all document references by their (decreasing) relevance score, generate a search results page of the ordered references, and present the search results page to the user.
The value or worth of a search engine is determined partly by the quality of the relevance features used in the search engine and partly by the quality of the combination of these features. (There may be many other factors that determine the value or worth of a search engine.) There are several families of relevance features today, such as document frequency features, hyper-link graph features, and positional match features. What is needed are new relevance features to improve the ranking of document references in a search results page.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.