1. Field of the Invention
The present invention generally relates to searching and organizing a collection of hyperlinked, hypertext documents, such as those making up the World-wide-web. The present invention exploits the words (or terms) present in each of the documents and exploits the link structure (or link topology) between the hypertext documents to organize the document collection into various groups or clusters. The present invention also identifies typical documents in a group or a cluster. Finally, the present invention provides a way to rank documents in each of the groups. As an example, suppose a web search engine has returned a set of hyperlinked documents in response to a user search or query. The present invention provides a way to organize the set of returned documents into various groups or clusters.
2. Description of the Related Art
The explosive growth of the World-Wide-Web has created an abundance of hyperlinked document corpora. Prominent examples of such data are the IBM patent server, the Internet archive, and scientific literature. Mining the information present in such corpora represents a major contemporary scientific and technological challenge. Given a user query, today's Internet search engines may return a large number of relevant documents. Without effective summarization, it is a hopeless and enervating task to sort through all the returned documents in search of high-quality, representative information resources. In particular, what is needed is a technique that can aid in organizing, ranking, and effectively summarizing the gist or essence of the results returned by the Internet search engine.