This invention relates generally to computerized techniques for identifying relevant documents. More particularly, this invention describes computerized techniques for adaptively ranking documents identified in response to a search query.
A text search engine receives from a user one or more words of text that form a query. The query may include other search operators, such as Boolean operators, proximity operators, and the like. The search engine returns documents that it deems relevant to the query. For instance, on the query xe2x80x9cfootballxe2x80x9d, a search engine may return all documents that contain the term xe2x80x9cfootballxe2x80x9d.
For many queries, a large number of matching documents is found. The search engine then uses one or more heuristics by which it orders the documents matching the query. These heuristics include methods that consider the statistics of occurrences of the query terms in each matching document, the hyperlink structure, if any, between the documents, and other criteria.
Because the list of matching documents can be in the thousands, the xe2x80x9ctrulyxe2x80x9d relevant documents may not be ranked at the top of the list. Therefore, methods are being developed wherein a search engine xe2x80x9clearnsxe2x80x9d the relevant documents for a query over time, based on the actions of its users. One elementary method of this technique maintains, for each pair consisting of a query q and a document d, the total number of times N(d,q) that document d is selected for viewing by users issuing query q to the search engine. On receiving query q, the search engine first retrieves all documents that match the query q; it then ranks them in decreasing order of the values N(d,q). This technique is described in U.S. Pat. Nos. 6,006,222 and 6,014,665. Thus, the order in which the search engine presents the results for a query q may change with time, depending on the behavior of users. Since this technique is time-variant, it is referred to as an adaptive method. In contrast, scoring methods that are time-invariant are referred to as static methods.
It is possible to combine the scores from traditional static methods with adaptive methods, and use the composite score for ranking. This is often useful because in the case of some queries the static methods perform well, while for others the adaptive method corrects any deficiencies of the static score over time. Unfortunately, it is impossible to predict a priori, for any corpus of documents and any associated search engine, on which queries the static method is satisfactory and on which other queries the adaptive method is satisfactory.
In view of the foregoing, it would be highly desirable to provide a technique that selectively emphasizes a static method or an adaptive method to achieve optimal search results for a given query.
The invention includes a method of ranking search results. The method produces a relevance score for a document in view of a query. A similarity score is calculated for the query utilizing a feature vector that characterizes attributes and query words associated with the document. A rank value is assigned to the document based upon the relevance score and the similarity score.
The invention also includes a computer readable memory to rank search results. The computer readable memory includes a search engine to produce relevance search results based upon a query, the relevance search results including a list of documents, wherein each document includes an associated relevance score. A viewed document database stores viewed document indicia corresponding to documents viewed in response to the relevance search results. A viewed document processor associates the viewed document indicia with different queries. A vector constructor forms a feature vector for each viewed document, each feature vector characterizing attributes associated with a selected viewed document and query words associated with the selected viewed document. A similarity processor calculates a similarity score for the query utilizing the feature vector of the selected viewed document. A ranking processor assigns a rank value for the selected viewed document based upon a function that incorporates the relevance score and the similarity score for the selected viewed document.
The invention also includes a computer readable memory with a search engine to produce a relevance score for a document in view of a query. A similarity processor calculates a similarity score for the query utilizing a feature vector that characterizes attributes and query words associated with the document. A rank processor assigns a rank value to the document based upon the relevance score and the similarity score.
The invention provides improved search results by adaptively ranking, based upon the prior behavior of users, documents returned from a text search engine. More particularly, the prior behavior of users is utilized to determine the rate at which to apply adaptive correction for a given query.