The present invention relates to hyperlinked document systems. More specifically, the invention relates to techniques for finding related hyperlinked documents using link-based analysis.
The Internet, and more specifically the World Wide Web, provides users all over the world with virtually unlimited amounts of information in the form of hyperlinked documents. As new information is added to the Web, more hyperlinked documents are added that include links to the existing web of information.
One of the reasons for the almost explosive growth of information on the Web is that virtually anyone can add hyperlinked documents, which will be immediately available to users around the world. For better or worse, the Web is virtually unstructured, meaning that users are free to add information to the Web in almost any way they desire. Although this provides great flexibility in adding information to the Web, it can significantly increase the difficulty in finding information that is desired.
Probably the most popular mechanism for finding information on the Web is to use word-based search engines. Word-based search engines allow a user to enter words, phrases, and other search criteria so that the search engine can retrieve the hyperlinked documents that best match the user""s search criteria.
Word-based search engines have been tremendously successful in allowing users to find the information they desire on the Web. There are times, however, when a user wants to find hyperlinked documents that are related to and at the same level of generality to a selected hyperlinked document. For example, a user may be viewing a company""s web site and wish to see other web sites for competitive companies. As another example, a user may have found a university""s computer science department web site and the user may desire to see computer science department web sites of other universities. Traditional word-based search engines may not provide satisfactory results for these types of desired information.
Some web sites have recognized this deficiency and have taken on the pain staking process of categorizing the information on the Web. Although it is possible that the related hyperlinked documents that are desired are in a single category, it often happens that the related hyperlinked documents are spread throughout multiple categories. For example, if information regarding each university is placed in a separate category, one will not find a single category that includes information regarding the computer science departments of multiple universities. Additionally, categorizing the information on the Web takes a considerable amount of time and typically requires human decision making to categorize the information.
Therefore, what is needed are innovative techniques for finding related hyperlinked documents without requiring human categorization of the information.
The present invention provides innovative techniques for finding related hyperlinked ocuments using link-based analysis. The link structure of the hyperlinked documents is analyzed in order to find hyperlinked documents that are related to and at the same level of generality of a hyperlinked document. The invention can be utilized any number of ways including as an additional feature for a word-based search engine or as an addition on a web browser. Some specific embodiments of the invention are described below.
In one embodiment, the invention provides a computer implemented method of generating lists of hyperlinked documents that are related to a given or selected hyperlinked document. A first set of hyperlinked documents that have a forward link to the selected hyperlinked document is provided. Additionally, a second set of hyperlinked documents that are pointed to by the forward links in the hyperlinked documents in the first set is provided. A value is assigned to each forward link in each of the hyperlinked documents in the first set, with the value being reduced for a forward link if there are multiple hyperlinked documents from the same host as the hyperlinked document that includes the forward link. A score is generated for each hyperlinked document in the second set according to the values of the forward links pointing to the hyperlinked document. Accordingly, a list of related hyperlinked documents is generated from the second set according to the score of the hyperlinked documents. In a preferred embodiment, the related hyperlinked documents are displayed in an order based on their score.
In another embodiment, the invention provides a computer implemented method of generating lists of related hyperlinked documents. A first set of hyperlinked documents that have a forward link to a selected hyperlinked document is provided. A second set of hyperlinked documents that are pointed to by the forward links in the hyperlinked documents of the first set is also provided. A value is assigned to each forward link in each of the hyperlinked documents in the first set, with the value being reduced according to the number of forward links in the hyperlinked document that includes the forward link. A score is generated for each hyperlinked document in the second set according to the values of the forward links pointing to the hyperlinked document. Lastly, a list of related hyperlinked documents is generated from the second set according to the score of the hyperlinked documents.
Other features and advantages of the invention will become readily apparent upon review of the following description in association with the accompanying drawings, where the same or similar structures are designated with the same reference numerals.