Internet search engines have become fundamental tools for nearly all users seeking information and sites on the World Wide Web (WWW). Users can find vast amounts of data and select the data that appears to best match specific search criteria. Free-text searches are generally performed by providing a search phrase including one or more keywords, and optionally Boolean operators. The most widely used free-text search engines currently are provided by Google, Inc. and Yahoo, Inc.
Based on the search phrase provided by a user, a search engine generally returns a list of documents from which the user selects those that appear most relevant. The list typically includes a snippet from each of documents that includes one or more of the keywords, and the URL of the document. Typically, the search engine presents the list of documents in descending order according to general, static criteria established by the search engine provider. Numerous techniques have been developed for ranking the list in order to provide the results most likely to be relevant to a typical user. Some of these techniques take into account the order of the keywords provided by the user.
Such static ranking systems often present high-ranking results that do not match the interests or skills of the searcher, or that do not provide results that correctly reflect the intended meaning of keywords having more than one meaning. For example, a software engineer looking for Java (i.e., software) and a traveler looking for Java (i.e., the island) receive the same results for a query that includes the same keywords, even though their searches had different intended meanings.
In an attempt to increase the relevancy of search results, some search engines suggest search refinement options based on the search keywords entered by the searcher. These search engines typically analyze previous searches conducted by other users, in order to identify refinement options that are related to the keywords entered by the searcher. The searcher is able to narrow his search to better express his search intent by selecting one or more of the refinement options. For example, Google Suggest, provided by Google, Inc., displays a drop-down list of additional related search phrases, as the searcher enters a search query in a search text box. The Clusty search engine, provided by Vivisimo, Inc. groups similar results together into clusters. Some search engines, such as Google, upon detecting potential misspelling of search keywords, present a replacement search query including replacement keywords spelled correctly.
U.S. Pat. No. 5,987,457 to Ballard, which is incorporated herein by reference, describes a method in which a user views search results and subjectively determines if a document is desirable or undesirable. Only documents categorized by the user are analyzed for deriving a list of prospective keywords. The frequency of occurrence of each word of each document is derived. Keywords that occur only in desirable documents are good keywords. Keywords that occur only in undesirable documents are bad keywords. Keywords that occurs in both types are dirty keywords. The best keywords are the good keywords with the highest frequency of occurrence. The worst keywords are the bad keywords with the highest frequency of occurrence. A new query phrase includes the highest ranked good keywords and performs filtering using the highest ranked bad keywords. Key phrases are derived to clean dirty keywords into good key phrases. A key phrase also is derived from a good keyword and replaces the good keyword to narrow a search.
US Patent Application Publication 2005/0076003 to DuBose et al., which is incorporated herein by reference, describes a process for sorting results returned in response to a search query according to learned associations between one or more prior search query search terms and selected results of said prior search queries.
U.S. Pat. No. 6,732,088 to Glance, which is incorporated herein by reference, describes techniques for facilitating searching a data collection, such as the WWW, that take advantage of the collective ability of all users to create queries to the data collection. First, a node-link graph of all queries submitted to a data collection within a given period of time is constructed. In the case of the WWW, the queries would be to a particular search engine. In the graph, each node is a query. There is a link made between two nodes whenever the two queries are judged to be related. A first key idea is that the determination of relatedness depends on the documents returned by the queries, not on the actual terms in the queries themselves. For example, a criterion for relatedness could be that of the top ten documents returned for each query, the two lists have at least one document in common. A second key idea is that the construction of the query graph transforms single user usage of the data collection (e.g., search) into collaborative usage. As a result, all users can tap into the knowledge base of queries submitted by others, because each of the related queries represents the knowledge of the user who submitted the query.
U.S. Pat. No. 6,772,150 to Whitman et al., which is incorporated herein by reference, describes a search engine system that uses information about historical query submissions to a search engine to suggest previously-submitted, related search phrases to users. The related search phrases are preferably suggested based on a most recent set of query submission data (e.g., the last two weeks of submissions), and thus strongly reflect the current searching patterns or interests of users.
US Patent Application Publication 2003/0123443 to Anwar, which is incorporated herein by reference, describes a search engine that utilizes both record based data and user activity data to develop, update, and refine ranking protocols, and to identify words and phrases that give rise to search ambiguity so that the engine can interact with the user to better respond to user queries and enhance data acquisition from databases, intranets, and internets.
The following patents, patent application publications, and other publications, all of which are incorporated herein by reference, may be of interest:
U.S. Pat. No. 6,636,848 to Aridor et al.
U.S. Pat. No. 4,823,306 to Barbic et al.
U.S. Pat. No. 6,513,036 to Fruensgaard et al.
US Patent Application Publication 2002/0133483 to Klenk et al.
U.S. Pat. No. 5,926,812 to Hilsenrath et al.
U.S. Pat. No. 6,289,353 to Hazlehurst et al.
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 6,363,379 to Jacobson et al.
U.S. Pat. No. 6,347,313 to Ma et al.
U.S. Pat. No. 6,321,226 to Garber et al.
U.S. Pat. No. 6,189,002 to Roitblat
U.S. Pat. No. 6,167,397 to Jacobson et al.
U.S. Pat. No. 5,864,845 to Voorhees et al.
U.S. Pat. No. 5,825,943 to DeVito et al.
US Patent Application Publication 2005/0144158 to Capper et al.
US Patent Application Publication 2005/0114324 to Mayer
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 5,857,179 to Vaithyanathan et al.
U.S. Pat. No. 7,139,755 to Hammond
U.S. Pat. No. 7,152,061 to Curtis et al.
U.S. Pat. No. 6,904,588 to Reddy et al.
U.S. Pat. No. 6,842,906 to Bowman-Amuha
U.S. Pat. No. 6,539,396 to Bowman-Amuha
US Patent Application Publication 2004/0249809 to Ramani et al.
US Patent Application Publication 2003/0058277 to Bowman-Amuha
U.S. Pat. No. 6,925,460 to Kummamuru et al.
U.S. Pat. No. 6,920,448 to Kincaid et al.
US Patent Application Publication 2006/0074883 to Teevan et al.
US Patent Application Publication 2006/0059134 to Palmon et al.
US Patent Application Publication 2006/0047643 to Chaman
US Patent Application Publication 2005/0216434 to Haveliwala et al.
US Patent Application Publication 2003/0061206 to Qian
US Patent Application Publication 2002/0073088 to Beckmann et al.