The number of documents available on the Internet is astronomical and increasing at a tremendous rate. Internet users search for documents of interest from this enormous corpus of documents by using dedicated search engines such as Google™ or web portals such as Yahoo™ that operate as a gateway to dedicated search engines. Such search engines are instances of information retrieval systems.
Conventional search engines perform a keyword search on their corpus of documents in response to a query input by a user and return search results typically corresponding to the keyword search. Some search engines also rank the returned search results according to a relevance determined based upon the degree of closeness of the documents in the search result to the query, when presenting the search results to a user. For example, conventional vector space search engines generate a query vector Q={q1, q2, . . . , qR} based upon the input query and document vectors Di={di1, di2, . . . diR} based upon the terms in the documents themselves and rank the search results according to some measure of the distance between the query vector Q and the document vector Di, for example, the cosine between the vectors Q and Di. The query vector Q and the document vector Di are derived from terms in the corpus of documents. Here, the elements {q1, q2, . . . , qR} in the query vector Q are the weights associated with the terms in the input query. In a typical keyword search, the elements {q1, q2, . . . , qR}of the query vector Q are ones (1) or zeros (0) depending upon whether the corresponding term is included in the input query. Also, the elements {di1, di2, . . . diR} in the document vectors Di are the weights associated with the terms in the document, and each weight {di1, di2, . . . diR} is typically based on the frequency at which the term associated with the weight appears in the document and the frequency of the term in the corpus of documents. The basic operations of a vector space search system is described in Gerard Salton and Michael J. McGill, “Introduction to Modem Information Retrieval,” McGraw-Hill, Inc., 1983.
Users of conventional search engines receive the search results, typically ranked according to the documents' relevance scores with respect to the user's search query. If the query was input to multiple search engines, the user will receive the individual search results returned by each search engine. Because each search engine has its own corpus of documents and its own scoring algorithm, the relevance scores for documents retrieved by each search engine are typically not comparable. For example, two search engines processing a query may score the same document with two entirely different relevance scores. Even then, the relevance scores cannot be directly compared (i.e., ranked) because they represent relevance only in the context of the other documents in the respective corpuses. For example, a document may receive a relevance score of 80% from one search engine, and could be the best scoring document in that search engine's corpus, and also receive a 95% relevance score from the other search engine and yet be only the 20th best scoring document. Conversely, a document may receive the same relevance score from different search engines, and again have entirely different rankings within each set of search results. For this reason, there is a need for a method of ranking the documents retrieved from multiple search engines.
User feedback information may be useful in ranking the relevance of retrieved documents. The users may review some of the retrieved documents in the search results in detail as they appear relevant to the user, but may also ignore other documents in the search results as they appear, for example from their title or summary description, irrelevant to the user. Such feedback from the user regarding previous search results may be very useful in determining the relevance of future search results, as they indicate the type of documents that the user is interested in for certain queries. The user's feedback on previous search results may be useful in ranking future search results in a manner consistent with the user's interest in certain type of documents. Such user's feedback is particularly useful in ranking future search results, when the search results are retrieved from a plurality of search engines each having its own corpus of documents and returning search results based on different scales of relevance scores.
However, conventional search engines are not capable of taking into consideration the user's feedback given in response to previous searches when ranking future search results retrieved from a plurality of search engines.
Therefore, there is a need for an information retrieval method in which the user's feedback given in response to previous queries is taken into consideration when ranking future search results retrieved from a plurality of search engines. There is also a need for an information retrieval method in which documents retrieved in response to a query are ranked according to the user's feedback on previous search results.