This invention relates generally to hyperlinks between interrelated documents. More particularly, this invention relates to automatically creating hyperlinks in documents for a plurality of interconnected web pages on the World Wide Web.
The Internet, and particularly the World Wide Web, is gaining increasing popularity. A user typically navigates the World Wide Web by use of a network browser such as Netscape Navigator. The user will type in or otherwise provide a Uniform Resource Locator (URL) to the browser to link to a particular web server which serves a particular web page. The user may continue to navigate in this manner by providing URLs to the browser.
One of the more important ways to navigate on the World Wide Web is by use of hyperlinks in the web pages. The hyperlink is usually indicated by a different color of text or graphic indicating that a link is available at the location in the page. When the user clicks on such a hyperlink, an associated web page or web site with additional or related information on the subject is presented to the user by the browser. The link to the new page, which may be on the same web server or a geographically remote web server, is accomplished by the fact that the URL is provided to the browser upon actuation of the hyperlink. Hyperlinks have embedded in them the URL of the link target. There are some assumptions with the qualification of the URL. For instance, if the hyperlink URL is abc.html, then the assumption is that it is referencing another page in the same directory on the same server as the page containing the link. For instance, when currently viewing a URL: http://www.mywebsite.com/foopages/xyz.html, and it contained the abc.html link, the assumption is that it is in the same directory, so the browser issues an http request to http://www.mywebsite.com/foopages/abc.html. This is only a shorthand specification and allows relocation of the site. Hyperlinks otherwise are fully-qualified URLs. One can add a hyperlink to a personal home page: http://www.yahoo.com/news/sports. Clicking on that link is identical in the browser to going to the URL line and typing that string to go to Yahoo sports.
While the World Wide Web has an ever growing amount of information presented on the growing number of web pages, many of the pages of information which could be published in a web page format today predate the web technology. These pages of information typically do not have hyperlinks placed in appropriate locations within the page. This preexisting information could be manually edited and hyperlinks could be manually inserted in appropriate places. For large documents with many related references, the effort required would be very great. Thus, despite the existence of other related information, the manual effort required discourages the addition of hyperlinks in these documents. Nonetheless, if hyperlinks were installed in these pages, they would be more useful to the user. Therefore, it would be desirable to automatically generate hyperlinks in existing files to convert the files to a set of interrelated web pages.
In the prior art, it has been suggested that a hyperlinked document could be created by parsing an existing document using keywords. The parser is presented with a list of keywords and generates a hyperlink to another part of the hyperlinked document at the position of the keyword. There are several problems with the approach. In most cases, the user has no prior knowledge of the words that a document might contain. Therefore, the prior art method forces a user to read the document beforehand, either to choose new keywords, to assign an existing list of keywords or to choose another document from which a list of keywords can be generated. This effort can be so great that it is little better than generating the hyperlinks manually. Further, in many cases, common keywords are of no use whatsoever; hyperlinks should be generated at places in the document where very unusual words occur. Also, where keywords occur in adjacent positions, two hyperlinks can be created where one or possibly none would be more appropriate.
The present invention provides another solution to the problem.