The World Wide Web (Web), as its name suggests, is a decentralized global collection of interlinked information, generally in the form of “pages” that may contain text, images, and/or media content related to virtually every topic imaginable. A user who knows or finds a uniform resource locator (URL) for page can provide that URL to a Web client (generally referred to as a browser) and view the page almost instantly. Since Web pages typically include links (also referred to as “hyperlinks”) to other pages, finding URLs is generally not difficult.
What is difficult for most users is finding URLs for pages and other resources that are of interest to them. The sheer volume of content available on the Web has turned the task of finding a page relevant to a particular interest into what may be the ultimate needle-in-a-haystack problem. To address this problem, an industry of search providers (e.g., Yahoo!, MSN, and Google) has evolved.
A search provider typically maintains a database of Web pages in which the URL of each page is associated with information (e.g., keywords, category data, etc.) reflecting its content. The search provider also maintains a search server that hosts a search page (or site) on the Web. The search page provides a form into which a user can enter a query that usually includes one or more terms indicative of the user's interest. Once a query is entered, the search server accesses the database and generates a list of “hits,” typically URLs for pages whose content matches keywords derived from the user's query. This list is provided to the user.
Since queries can often return hundreds, thousands, or in some cases millions of hits, search providers have developed sophisticated algorithms for ranking the hits (i.e., determining an order for displaying hits to the user) such that the pages most relevant to a given query are likely to appear near the top of the list. Typical ranking algorithms take into account not only the keywords and their frequency of occurrence but also other information such as the number of other pages that link to the hit page, popularity of the hit page among users, and so on. These ranking algorithms are an important part of algorithmic search.
To further facilitate use of their services, some search providers now offer “search toolbar” add-ons for Web browser programs. A search toolbar typically provides a text box into which the user can type a query and a “Submit” button for submitting the query to the search provider's server. Once installed by the user, the search toolbar is generally visible no matter what page the user is viewing, enabling the user to enter a query at any time without first navigating to the search provider's Web site. Searches initiated via the toolbar are processed in the same way as searches initiated at the provider's site; the only difference is that the user is spared the step of navigating to the search provider's site.
While automated search technologies can be very helpful, they do have a number of technological limitations, a primary one being that a user often has difficulty formulating a query to direct the search to relevant content. A query that is too general might return a large quantity of hits, few of which are relevant. A query that is too specific might fail to return many relevant hits. A user often has a fairly specific intent in mind at the time of making a query, but the query might not unambiguously express this intent. For example, a user who enters the query “jaguar” might be thinking of the automobile, rather than the animal, the professional football term, or something else. But the entered query “jaguar” does not express this specific intent.
Until recently, search technologies did not provide reliable ways of disambiguating the intent of a query. U.S. Pat. No. 7,051,023 discloses a search system providing an interface to generate concept units from query logs and to use the concept units to disambiguate the intent of a query. This search technology, however, has limitations of its own. Breaking a query into concept units may or may not disambiguate the intent of a query. For example, for a query “pictures of jaguar”, the two concept units “pictures” and “jaguar” would not disambiguate whether the intent of the query is the pictures of the automobile having the “jaguar” brand or the pictures of the animal called jaguar.