2.1 Field of the Invention
The present invention relates to facilitating access to information over a computer network such as the Internet. More particularly, the present invention relates to technology for partially automating the linking of documents on the World Wide Web by authors of Web content. Such techniques are particularly useful for more easily creating richly interconnected information on the Web.
2.2 Description of Related Art
The World Wide Web provides an enormous distributed database of information interconnected physically by the Internet. One of the main difficulties for users of the Web is finding needed information out of the tremendous quantity of information that is available. Various mechanisms have been developed to address this problem.
One mechanism for facilitating access to information on the Web is the index website. An index website is typically a server computer connected to the World Wide Web which maintains an index of Web content that can be searched in various ways by users (clients) connected to the server over the Internet. Indexes are often updated automatically by means of “spiders” which systematically explore the Web looking for new or updated content. Most search engines also provide means for users to install information to be indexed, so that such information may be indexed immediately without waiting for a spider to find it. An example of a premier search engine is the “Alta Vista” website, accessible on the Web at the Universal Resource Locator (URL) address http://www.altavista.com.
A difficulty with search engines is that search results typically contain too much undesired information as well as the desired information. This occurs because the information content of the Web is vast, and because it is difficult for users to construct search parameters in such a way as to pass most desired content while rejecting most undesired content. As a result, users typically must spend a lot of time sifting through search-engine results and/or refining their searches with additional restrictions in the search parameters. Additionally, the information stored in the index is not organized in a form suitable for browsing in a logical order.
Another mechanism developed to facilitate access to information on the World Wide Web is the directory website which presents a hierarchical directory of information that can be browsed by the user. Premier sites of this nature include Yahoo (<<http://www.yahoo.com>>), Netscape (<<http://www.netscape.com>>), and Excite (<<http://www.excite.com>>). A visitor to such a site is first presented with a top-level list of topics. Choosing a topic by clicking on a topic's hypertext link with the mouse produces a list of subtopics, and so on, until a final level is reached at which useful information is displayed about the topic, or else a remote website pertaining to that topic is visited. Directory companies such as Yahoo typically have teams of editors who explore the Web looking for content suitable for reference at their site, and these workers perform a function analogous to the automatic “spiders” used by automated index websites. Like the search engines, directory websites normally support searching within the directory site, thus producing search results of generally higher quality and less “clutter” than typically encountered on an index site. Also like index websites, directory websites typically allow submission of content for reference, subject to editorial consideration. Thus, directory websites improve over index websites by providing editorial selection, logical organization, and browsing capability, all of which are absent in typical index websites.
A first difficulty, however, with directory websites is that they cannot reasonably keep up with the vastness of the information on the World Wide Web by means of manual editorial selection. As a result, directory websites tend to offer far less information relative to index websites. A second difficulty with directory servers is that their content is proprietary and controlled by a team of editors at one company. This editorial control, while ensuring consistently high quality on the site, makes it difficult and sometimes even infeasible for an information provider to obtain a desired listing in the hierarchical directory. One directory site that addresses this difficulty is the Open Directory project (<<http://dmoz.org/>>); The Open Directory allows any user on the Internet to become an “editor” for a particular topic at the site. A third difficulty related to the first is that typical directory sites are extremely broad in scope, contributing to the absence of specialized information that is not of interest to a wide general audience.
A difficulty with both index and directory websites is that information is presented without regard to the user's level of education. It is therefore often possible for a high-school senior working on a book report, for example, to encounter information understandable only by a graduate student in a specialized field. There is similarly normally no means for selecting information according to its type or source or other potentially desirable criteria.
To assist users in selecting sources of information, some websites provide a user rating system (or “scoring system”) to which any user may contribute. An example of this mechanism is seen in the online book-store website <<http://www.amazon.com/>>. Amazon allows any user to contribute a “book review” and an overall rating on a five-star scale. The average rating is displayed for each book, and books which match the user's search criteria are displayed sorted according to decreasing score (and possibly other criteria such as the number sold). An interesting feature of the Amazon rating system is that it is democratic, allowing the vast quantity of World Wide Web users to jointly develop a ranking of the information sources (in this case books). Such a scheme addresses the difficulty of sorting through enormous quantities of information by harnessing a potentially enormous base of users as contributing editors, in effect. A difficulty with rating systems is that they are generally used only at the site where the ratings are collected, and no mechanism is provided for making use of the ratings elsewhere, such as in other documents on the Web linking to the same information.
An important mechanism integral to the function of the World Wide Web is the HyperText Markup Language (HTML) which is a text format supported by Web browser programs (such as Netscape Navigator or Microsoft Internet Explorer). A more recent variant called XML is now gaining support, and its function is similar to that of HTML for present purposes. HTML provides for the specification of hypertext links in Web-page text displayed by the browser. At a minimum, a hypertext link consists of text to be displayed by the browser and a link target which is usually not displayed. For example, the HTML code<a href=“http://www.w3k.org”>W3K website</a>contains the text (also known as the anchor) “W3k website”, while the link target is http://www.w3k.org which is a URL pointing to the W3K website. Thus, the link target is normally addressed by a URL pointing to information on the Web about the displayed word or phrase. (The complete HTML format specification may be found online at the URL http://www.w3.org/.) To the browser user, the anchor text of a hypertext link as above appears in a Web-page display as an underlined word or phrase, e.g.,Visit the W3K website for more information regarding automatic link installation.and usually in a different color than normal, unlinked text. By clicking on the hypertext link with the mouse, the user directs the browser program to “follow the link” by “navigating” to the URL associated with the link. The link-target URL may point to another Web page anywhere on the World Wide Web, or it may simply point to another location within the same electronic document. Hypertext links in HTML documents make it much easier for the user to explore the World Wide Web by visiting Web pages and clicking on the links found therein. Web browsers further make it easy to return to the page containing the link by using the “back” button, or the “history” list of visited pages maintained by the browser.
A difficulty with hypertext links is that they must be laboriously added by Web content providers. Typical HTML editors merely provide a data-entry form in which the URL for the link target can be typed. A second shortcoming of HTML and Web browsers is that there is no standard mechanism for specifying link properties such as educational level, type of resource, information source, or the like, which could be supported by Web browsers to give the user finer control of link display based on link properties. After the links are typed in, they must be maintained as their URLs change, and as new and better link-targets become available. There is therefore a need for automated assistance with entering, maintaining, and improving hypertext links in documents intended for a hypertext document environment such as the Web.