The present invention relates to a system and method for intelligent retrieval of information, and more particularly to a system and method for selecting the most relevant source or sources of information on a given topic.
With the advent of computers, and in particular of computer networks such as the Internet and the World Wide Web, vast quantities of information have become instantly accessible to the user. These huge amounts of information come in many forms including textual data, image data, video streams, an audio format or sound streams and combinations thereof. Furthermore, information is now available on substantially any topic. However, the drawback of such accessibility is the increasing difficulty of actually finding information on the desired topic, and separating such desired information from the large amounts of information which are not of interest. Thus, information retrieval has actually become more complicated, particularly with regard to locating sources of information relevant to the topic of interest.
Previous attempts have been disclosed in the prior art for increasing the efficiency of information retrieval. Generally, these attempts have centered upon methods for ranking the relevancy of retrieved information according to the frequency of keywords. For example, U.S. Pat. No. 5,321,833 to Chang et al. discloses a method for quantifying the relevance of retrieved information according to the weighted frequency of appearance of query terms in the retrieved information. The weighting function incorporates such factors as the distance between query terms in the retrieved information. However, all of these prior art methods, of which a relatively complex example is disclosed in U.S. Pat. No. 5,321,833, suffer from the limitation of only ranking information after it has been retrieved. Such an approach is suitable only if the source of information is itself highly relevant, for example a database of information for the topic of interest. However, if information is being retrieved from multiple sources, such as Web sites on the World Wide Web (WWW), such a method for ranking retrieved information is not as useful. Thus, prior art methods for ranking retrieved information alone are lacking for information retrieval from multiple sources of unknown quality or relevance.
A more useful method would sample portions of information from multiple sources of information, and would then use the sampled information to determine the relevancy of the information source. The multiple sources could then be ranked, so that sources of most interest would be more highly ranked. Such ranking could then be used to determine patterns for searching for information of interest, for accessing the sources of information or for ranking the retrieved information, for example. Such a method would not be restricted to ranking information after it had been retrieved, and would therefore enable the information sources themselves to be evaluated. Unfortunately, such a method is neither taught nor suggested by the prior art.
There is therefore a need for, and it would be useful to have, a method and a system for both ranking retrieved information on a topic of interest, and for determining the relevancy of sources of information for the topic of interest, which would rank both the retrieved information and these sources according to the topic of interest, and which would incorporate user feedback into the ranking functions, thereby increasing the efficiency of information search and retrieval.