The Internet has provided users with a mechanism for publishing and/or obtaining information regarding any subject matter. For example, various web sites are dedicated to posting text, images, and video relating to world, national, and/or local news. Typically, the information available on Web sites and servers can be accessed utilizing a Web browser that executes on a computer. A user can launch a Web browser and access a Web site by entering a Uniform Resource Locator (URL) of the Web site into an address bar of the Web browser and pressing the enter key on a keyboard or clicking a button with a mouse. This scenario presumes the user knows the URL of the Web-site that contains the information relevant to the user's needs. However, in many instances, the user does not know the URL of the Web-site and may not even know whether such a site exists. In such cases, the user typically employs a search engine to locate a site that provides the user desired information by sending to the search engine a query the user believes is relevant for finding the information.
Using a stored index of the World Wide Web the search engine finds Web documents that match the user's query, ranks the documents based on measures of how good the match is (such as term frequency, inverse document frequency, and term proximity) in conjunction with measures of Web-site/document popularity, and returns the ordered list to the user. The user can then select one of the web documents returned by the search engine to review the content therein.
Often, however, users have difficulty formulating a query that would steer the search engine to return documents relative to the user's needs. In such cases, the search engine will return a substantial number of sites that are unrelated to the particular interests of the user. For example, if a user searching for information related to biological viruses submits the keyword “virus” as a query to a Web search engine the user may receive information relating to biological viruses as well as computer viruses. The user can thereafter scroll through a plurality of returned sites in an attempt to determine if the sites are related to the interests of the user. Scrolling through returned results can be extremely time-consuming and frustrating to the user as general search engines can return a substantial number of sites when performing a search. The user can attempt to narrow the search by adding words such as “biological” or “health” to the query, but this action may discard very relevant sites that simply do not contain the particular words added to the query. Alternatively, the user may attempt structuring a query, such as by using a combination of Boolean operators, but it can be difficult to construct an appropriate Boolean search that will result in a return of sites containing relevant information.
Some search engines attempt to infer what a user is searching for based upon the set of possible semantic senses of keywords. For example, if a user entered the term “virus” into the general search engine, the search engine can return a plurality of sites together with suggestions for narrowing the search. More particularly, the search engine could return a plurality of suggestions, such as “do you want to search for a computer virus?” or “do you want to search for a biological virus?” For many searches (especially for more detailed and specific searches), this method requires selecting a continuing hierarchy of suggested searches and the returned sites may still lack relevant information. Furthermore, the user may desire to locate a site that will not be encompassed by the returned search suggestions.
Other search engines may attempt to match the user's search intents through a query expansion technique by determining terms to be added to the query (such as synonyms of the query terms) in order to construct new queries to be sent to be processed by the search engine instead of or in addition to the original query. However, query expansion techniques have several disadvantages. For example, if the original user query is dismissed or determined incorrectly by search engine to be not relevant, the search results might not include some information related to the user's search intent. Alternatively, the number of queries that the search engine has to handle is vastly increased and mediation between the various expansions is necessary.
Furthermore, users desire the ability to search for information based on what they personally find relevant. Some technologies permit users to input data to create a user profile that is employed to provide more relevant search results. However, users are often too busy to take the time to provide lengthy information criteria in order to facilitate the search process. They demand quick and efficient means to return search results that best suit their own needs, thereby increasing their satisfaction with their searches.
Thus, a difficulty of web searching includes increasing the relevance of the results returned by a search engine. User queries may have different degrees of ambiguity, from queries that are apparently unambiguous (e.g., “Safeco field zip code”) to queries that are extremely ambiguous in the absence of other information about the user's intent (e.g., “cat”, which can refer to the domestic animal, a wild animal, a singer, a construction company, or can represent the acronym of tens of other concepts). Especially for highly ambiguous queries, for which a very large number of web pages contain the queried terms, the absolute relevance of web pages or web sites (such as page rank) may not be indicative of how relevant those pages or sites are for such queries.
To overcome the foregoing as well as other difficulties, what is needed is a technique for re-ranking the top search results returned by a search engine based on the user's real needs when querying a search engine rather than their queries, which can be regarded as superficial forms of their needs. What is also needed is a technique for allowing a user to bias the search engine to increase and/or decrease the diversity of the top results and/or increase and/or decrease the number of results that address the informational needs of the majority of users in conjunction with a query.