Internet search engines have become fundamental tools for nearly all users seeking information and sites on the World Wide Web (WWW). Users can find vast amounts of data and select the data that appears to best match specific search criteria. Free-text searches are generally performed by providing a search phrase including one or more keywords, and optionally Boolean operators. The most widely used free-text search engines currently are provided by Google, Inc. and Yahoo, Inc.
Based on the search phrase provided by a user, a search engine generally returns a list of documents from which the user selects those that appear most relevant. The list typically includes a snippet from each of documents that includes one or more of the keywords, and the URL of the document. Typically, the search engine presents the list of documents in descending order according to general, static criteria established by the search engine provider. Numerous techniques have been developed for ranking the list in order to provide the results most likely to be relevant to a typical user. Some of these techniques take into account the order of the keywords provided by the user.
Such static ranking systems often present high-ranking results that do not match the interests or skills of the searcher, or that do not provide results that correctly reflect the intended meaning of keywords having more than one meaning. For example, a software engineer looking for Java (i.e., software) and a traveler looking for Java (i.e., the island) receive the same results for a query that includes the same keywords, even though their searches had different intended meanings.
Some search engines, such as the one provided by AOL, Inc., attempt to overcome this drawback by using user profiles that specify certain static characteristics of each user. Such characteristics may include information such as the searcher's age, location, job, and education. Each user must provide this information and keep it updated as the user's interests change over time. Such information often does not accurately reflect the user's skill levels in various interest areas. Such profiles also generally fail to adequately reflect the full diversity of the user's interests.
Some search engines are configured to rank results of multi-keyword searches using merge algorithms. For example, the search engine may use criteria to separately rank the results for each of the keywords searched separately, and merge the separate rankings to produce a list of search results containing all of the keywords. Some search engines use collaborative filtering based on social networks, forums, communities, or other types of groups, in an attempt to supply more relevant search results.
US Patent Application Publication 2005/0076003 to DuBose et al., which is incorporated herein by reference, describes a process for sorting results returned in response to a search query according to learned associations between one or more prior search query search terms and selected results of said prior search queries.
U.S. Pat. No. 6,732,088 to Glance, which is incorporated herein by reference, describes techniques for facilitating searching a data collection, such as the WWW, that take advantage of the collective ability of all users to create queries to the data collection. First, a node-link graph of all queries submitted to a data collection within a given period of time is constructed. In the case of the WWW, the queries would be to a particular search engine. In the graph, each node is a query. There is a link made between two nodes whenever the two queries are judged to be related. A first key idea is that the determination of relatedness depends on the documents returned by the queries, not on the actual terms in the queries themselves. For example, a criterion for relatedness could be that of the top ten documents returned for each query, the two lists have at least one document in common. A second key idea is that the construction of the query graph transforms single user usage of the data collection (e.g., search) into collaborative usage. As a result, all users can tap into the knowledge base of queries submitted by others, because each of the related queries represents the knowledge of the user who submitted the query.
U.S. Pat. No. 6,513,036 to Fruensgaard et al., which is incorporated herein by reference, describes techniques for searching and presenting electronic information from one or more information sources where the retrieval and presentation of information depends on context representations defined for a user performing the search, other users being similar to the user performing the search, and references to information. The context representation of each object affects/influences all the other objects with which it is in contact during the search process. This is described as ensuring a dynamic update of the relations between the objects and their properties.
US Patent Application Publication 2002/0133483 to Klenk et al., which is incorporated herein by reference, describes a system for automatically determining a characterizing strength which indicates how well a text in a database describes a search query. The system comprises a database storing a plurality of m texts, a search engine for processing the search query in order to identify those k texts from the plurality of m texts that match the search query. The system further comprises a calculation engine for calculating the characterizing strengths of each of the k texts that match the search query. The characterizing strength is calculated by creating a graph with nodes and links, whereby words of the text are represented by nodes and the relationship between words is represented by means of the links; evolving the graph according to a pre-defined set of rules; determining the neighborhood of the word, whereby the neighborhood comprises those nodes that are connected through one or a few links to the word; and calculating the characterizing strength based on the topological structure of the neighborhood.
U.S. Pat. No. 5,926,812 to Hilsenrath et al., which is incorporated herein by reference, describes a method for comparing the contents of two sets of documents, including extracting from a set of documents corresponding sets of document extract entries. The method further includes generating from the sets of document extract entries corresponding sets of word clusters. Each word cluster comprises a cluster word list having N words, an N×N total distance matrix, and an N×N number of connections matrix. The preferred embodiment includes grouping similar word clusters and combining the similar word clusters to form a single word cluster for each group. The grouping comprises evaluating a measure of cluster similarity between two word clusters, and placing them in a common group of similar word clusters if the measure of similarity exceeds a predetermined value. Evaluating the cluster similarity comprises intersecting clusters to form subclusters and calculating a function of the subclusters. In the preferred embodiment, the method is implemented in a system to automatically identify database documents which are of interest to a given user or users. In this implementation, the method comprises automatically deriving the first set of documents from a local data storage device, such as a user's hard disk. The method also comprises deriving the second set of documents from a second data storage device, such as a network machine. These techniques are described as providing fast and accurate searching to identify documents of interest to a particular user or users without any need for the user or users to specify what search criteria to use.
U.S. Pat. No. 6,772,150 to Whitman et al., which is incorporated herein by reference, describes a search engine system that uses information about historical query submissions to a search engine to suggest previously-submitted, related search phrases to users. The related search phrases are preferably suggested based on a most recent set of query submission data (e.g., the last two weeks of submissions), and thus strongly reflect the current searching patterns or interests of users.
U.S. Pat. No. 6,289,353 to Hazlehurst et al., which is incorporated herein by reference, describes an intelligent Query Engine system that automatically develops multiple information spaces in which different types of real-world objects (e.g., documents, users, products) can be represented. Machine learning techniques are used to facilitate automated emergence of information spaces in which objects are represented as vectors of real numbers. The system then delivers information to users based upon similarity measures applied to the representation of the objects in these information spaces. The system simultaneously classifies documents, users, products, and other objects. Documents are managed by collators that act as classifiers of overlapping portions of the database of documents. Collators evolve to meet the demands for information delivery expressed by user feedback. Liaisons act on the behalf of users to elicit information from the population of collators. This information is then presented to users upon logging into the system via Internet or another communication channel. Mites handle incoming documents from multiple information sources (e.g., in-house editorial staff, third-party news feeds, large databases, and WWW spiders) and feed documents to those collators which provide a good fit for the new documents.
US Patent Application Publication 2003/0123443 to Anwar, which is incorporated herein by reference, describes a search engine that utilizes both record based data and user activity data to develop, update, and refine ranking protocols, and to identify words and phrases that give rise to search ambiguity so that the engine can interact with the user to better respond to user queries and enhance data acquisition from databases, intranets, and internets.
The following patents, patent application publications, and other publications, all of which are incorporated herein by reference, may be of interest:
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 5,987,457 to Ballard
U.S. Pat. No. 6,363,379 to Jacobson et al.
U.S. Pat. No. 6,347,313 to Ma et al.
U.S. Pat. No. 6,321,226 to Garber et al.
U.S. Pat. No. 6,189,002 to Roitblat
U.S. Pat. No. 6,167,397 to Jacobson et al.
U.S. Pat. No. 5,864,845 to Voorhees et al.
U.S. Pat. No. 5,825,943 to DeVito et al.
US Patent Application Publication 2005/0144158 to Capper et al.
US Patent Application Publication 2005/0114324 to Mayer
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 5,857,179 to Vaithyanathan et al.
U.S. Pat. No. 7,139,755 to Hammond
U.S. Pat. No. 7,152,061 to Curtis et al.
U.S. Pat. No. 6,904,588 to Reddy et al.
U.S. Pat. No. 6,842,906 to Bowman-Amuha
U.S. Pat. No. 6,539,396 to Bowman-Amuha
US Patent Application Publication 2004/0249809 to Ramani et al.
US Patent Application Publication 2003/0058277 to Bowman-Amuha
U.S. Pat. No. 6,925,460 to Kummamuru et al.
U.S. Pat. No. 6,920,448 to Kincaid et al.
US Patent Application Publication 2006/0074883 to Teevan et al.
US Patent Application Publication 2006/0059134 to Palmon et al.
US Patent Application Publication 2006/0047643 to Chaman
US Patent Application Publication 2005/0216434 to Haveliwala et al.
US Patent Application Publication 2003/0061206 to Qian
US Patent Application Publication 2002/0073088 to Beckmann et al.