The Internet has become a valuable resource of information on a variety of topics. Many websites are directed to teaching users general information about certain topics. Other websites are directed to providing users with cutting-edge information, so that experts can stay abreast of trends, new developments, research possibilities, etc.
When a user wishes to find information on the Internet about a particular topic, the user often directs his or her browser to a search engine and enters a query related to the topic. In response, the search engine applies a relevance function to identify the most relevant websites, and presents the results in an order corresponding to relevance scores. While the website results may relate to the topic, users still have to browse through the websites in an attempt to find those websites with the information on the topic at the particular level of detail desired, especially since different users have varying levels of sophistication. In other words, depending on a user's familiarity with a topic, the user may be more interested in receiving either introductory or advanced documents. A student searching for help with a linear algebra problem requires a different set of documents than a professor of mathematics interested in studying the latest in the field.
Empirically, it has been noted that a typical web search engine, e.g., the Yahoo! search engine, returns a mix of introductory and advanced documents in response to a query. Further, in response to the same query, the search engine will return the same website results to a novice and to an expert, failing to address the backgrounds and requirements of the two users. Currently, there has been no means for a user to inform the search engine of the amount of background knowledge the user has on a topic, so that the search engine can return only those documents appropriate to the user's level of expertise or so that the search engine can group documents according to introductory/advanced levels. Adding trigger words (e.g., “primer,” “introduction,” “information,” “definition,” “characteristic,” “summary,” etc.) to a query to suggest a user has only an introductory level of familiarity with a topic has been found statistically insignificant. Adding trigger words to a query to suggest a user has an advanced level of familiarity with a topic is more difficult.
A system and method are needed that enable automatic classification of documents based on user familiarity with a topic.