1. Field of the Invention
The present invention relates to facilitating access to information over a computer network such as the Internet. More particularly, the present invention relates to technology for partially automating the linking of documents on the World Wide Web by authors of Web content. Such techniques are particularly useful for more easily creating richly interconnected information on the Web.
2. Description of Related Art
The World Wide Web provides an enormous distributed database of information interconnected physically by the Internet. One of the main difficulties for users of the Web is finding needed information out of the tremendous quantity of information that is available. Various mechanisms have been developed to address this problem.
One mechanism for facilitating access to information on the Web is the index website. An index website is typically a server computer connected to the World Wide Web which maintains an index of Web content that can be searched in various ways by users (clients) connected to the server over the Internet. Indexes are often updated automatically by means of xe2x80x9cspidersxe2x80x9d which systematically explore the Web looking for new or updated content. Most search engines also provide means for users to install information to be indexed, so that such information may be indexed immediately without waiting for a spider to find it. An example of a premier search engine is the xe2x80x9cAlta Vistaxe2x80x9d website, accessible on the Web at the Universal Resource Locator (URL) address http://www.altavista.com.
A difficulty with search engines is that search results typically contain too much undesired information as well as the desired information. This occurs because the information content of the Web is vast, and because it is difficult for users to construct search parameters in such a way as to pass most desired content while rejecting most undesired content. As a result, users typically must spend a lot of time sifting through search-engine results and/or refining their searches with additional restrictions in the search parameters. Additionally, the information stored in the index is not organized in a form suitable for browsing in a logical order.
Another mechanism developed to facilitate access to information on the World Wide Web is the directory website which presents a hierarchical directory of information that can be browsed by the user. Premier sites of this nature include Yahoo (http://www.yahoo.com), Netscape (http://www.netscape.com), and Excite (http://www.excite.com). A visitor to such a site is first presented with a top-level list of topics. Choosing a topic by clicking on a topic""s hypertext link with the mouse produces a list of subtopics, and so on, until a final level is reached at which useful information is displayed about the topic, or else a remote website pertaining to that topic is visited. Directory companies such as Yahoo typically have teams of editors who explore the Web looking for content suitable for reference at their site, and these workers perform a function analogous to the automatic xe2x80x9cspidersxe2x80x9d used by automated index websites. Like the search engines, directory websites normally support searching within the directory site, thus producing search results of generally higher quality and less xe2x80x9cclutterxe2x80x9d than typically encountered on an index site. Also like index websites, directory websites typically allow submission of content for reference, subject to editorial consideration. Thus, directory websites improve over index websites by providing editorial selection, logical organization, and browsing capability, all of which are absent in typical index websites.
A first difficulty, however, with directory websites is that they cannot reasonably keep up with the vastness of the information on the World Wide Web by means of manual editorial selection. As a result, directory websites tend to offer far less information relative to index websites. A second difficulty with directory servers is that their content is proprietary and controlled by a team of editors at one company. This editorial control, while ensuring consistently high quality on the site, makes it difficult and sometimes even infeasible for an information provider to obtain a desired listing in the hierarchical directory. One directory site that addresses this difficulty is the Open Directory project (http://dmoz.org/); The Open Directory allows any user on the Internet to become an xe2x80x9ceditorxe2x80x9d for a particular topic at the site. A third difficulty related to the first is that typical directory sites are extremely broad in scope, contributing to the absence of specialized information that is not of interest to a wide general audience.
A difficulty with both index and directory websites is that information is presented without regard to the user""s level of education. It is therefore often possible for a high-school senior working on a book report, for example, to encounter information understandable only by a graduate student in a specialized field. There is similarly normally no means for selecting information according to its type or source or other potentially desirable criteria.
To assist users in selecting sources of information, some websites provide a user rating system (or xe2x80x9cscoring systemxe2x80x9d) to which any user may contribute. An example of this mechanism is seen in the online book-store website http://www.amazon.com/. Amazon allows any user to contribute a xe2x80x9cbook reviewxe2x80x9d and an overall rating on a five-star scale. The average rating is displayed for each book, and books which match the user""s search criteria are displayed sorted according to decreasing score (and possibly other criteria such as the number sold). An interesting feature of the Amazon rating system is that it is democratic, allowing the vast quantity of World Wide Web users to jointly develop a ranking of the information sources (in this case books). Such a scheme addresses the difficulty of sorting through enormous quantities of information by harnessing a potentially enormous base of users as contributing editors, in effect. A difficulty with rating systems is that they are generally used only at the site where the ratings are collected, and no mechanism is provided for making use of the ratings elsewhere, such as in other documents on the Web linking to the same information.
An important mechanism integral to the function of the World Wide Web is the HyperText Markup Language (HTML) which is a text format supported by Web browser programs (such as Netscape Navigator or Microsoft Internet Explorer). A more recent variant called XML is now gaining support, and its function is similar to that of HTML for present purposes. HTML provides for the specification of hypertext links in Web-page text displayed by the browser. At a minimum, a hypertext link consists of text to be displayed by the browser and a link target which is usually not displayed. For example, the HTML code
 less than a href=xe2x80x9chttp://www.w3k.orgxe2x80x9d greater than W3K website less than /a greater than 
contains the text (also known as the anchor) xe2x80x9cw3K websitexe2x80x9d, while the link target is http://www.w3k.org which is a URL pointing to the W3K website. Thus, the link target is normally addressed by a URL pointing to information on the Web about the displayed word or phrase. (The complete HTML format specification may be found online at the URL http://www.w3.org/.) To the browser user, the anchor text of a hypertext link as above appears in a Web-page display as an underlined word or phrase, e.g.,
Visit the W3K website for more information regarding automatic link installation,
and usually in a different color than normal, unlinked text. By clicking on the hypertext link with the mouse, the user directs the browser program to xe2x80x9cfollow the linkxe2x80x9d by xe2x80x9cnavigatingxe2x80x9d to the URL associated with the link. The link-target URL may point to another Web page anywhere on the World Wide Web, or it may simply point to another location within the same electronic document. Hypertext links in HTML documents make it much easier for the user to explore the World Wide Web by visiting Web pages and clicking on the links found therein. Web browsers further make it easy to return to the page containing the link by using the xe2x80x9cbackxe2x80x9d button, or the xe2x80x9chistoryxe2x80x9d list of visited pages maintained by the browser.
A difficulty with hypertext links is that they must be laboriously added by Web content providers. Typical HTML editors merely provide a data-entry form in which the URL for the link target can be typed. A second shortcoming of HTML and Web browsers is that there is no standard mechanism for specifying link properties such as educational level, type of resource, information source, or the like, which could be supported by Web browsers to give the user finer control of link display based on link properties. After the links are typed in, they must be maintained as their URLs change, and as new and better link-targets become available. There is therefore a need for automated assistance with entering, maintaining, and improving hypertext links in documents intended for a hypertext document environment such as the Web.
It is a primary object of the present invention to facilitate the addition of hypertext links (also called xe2x80x9chyperlinks,xe2x80x9dxe2x80x9clinks,xe2x80x9d or xe2x80x9cdefinitionsxe2x80x9d) to documents intended for access on the Internet via the World Wide Web. Accordingly, the present invention is designed to provide a link installation service which automatically installs hyperlinks within information submitted to the service by hypertext authors. Submissions may be in HTML format, plain ASCII format, LaTeX source format, or a variety of additional formats to be added in the future. The output returned to the user may be in either HTML or LaTeX source format (which may be compiled into HTML format). Criteria can optionally be specified which govern the installation of hyperlinks.
The invention further provides selectable databases of hyperlinks, organized by category (or xe2x80x9ccontextxe2x80x9d), which can be optionally selected for automatic link installation. It is further provided that content developers may add their own links to the existing link databases, and they may additionally create new link databases and specify their relation to the existing link databases. Contributing users are preferably required to have a known, verified email address. A user with a verified email address is called a xe2x80x9cknown userxe2x80x9d. The invention further provides means for browsing the link databases in a logically organized, hierarchical tree structure, wherein higher-level nodes correspond to more general contexts, and lower-level nodes correspond to more specialized contexts. The link databases can additionally be searched for keyword matches within component fields. Users may provide ratings and/or reviews for individual links in the link databases.
The hyperlink databases of the present invention support various optional xe2x80x9cpropertiesxe2x80x9d associated with each hyperlink. One such property, useful in the development educational content, is a level designation which indicates the educational level required for best understanding of the link-target information. Additional optional properties include the language of the content (such as English), a viewer suitability rating such as exists for movies (PG-13, R, etc.), and properties defined by the user. Link properties can be specified by users to control the automatic installation of links, and/or to control what is displayed while browsing the link databases.
Educational levels not specified on submission are estimated based on the level of links found within the link target document. As a result, every link in the link database is assigned either an educational level, either manually or automatically. Determining levels automatically detects any xe2x80x9ccyclesxe2x80x9d in the link database. (A xe2x80x9ccyclexe2x80x9d occurs when document A links either directly or indirectly to document B, and document B links either directly or indirectly to document A.) Cycle detection can help content providers eliminate inadvertent xe2x80x9cforward references.xe2x80x9d Means are provided for marking forward-reference links in submitted documents so that educational level will not be affected. Cycle-free systems of links can be more effectively used as a basis for online course materials.
Another feature of the present invention is the ability for users to rate (or score) the quality of any link in the database and/or to submit a written review of any link. The quality ratings may be averaged together and used to determine the relative ordering of the links when there are multiple link targets for the same word or phrase (xe2x80x9ccompeting definitionsxe2x80x9d). In the typical case of HTML format, features of the JavaScript scripting language may be used to provide convenient access to multiple link targets, ranked according to score. Alternatively, the latest ranked list of competing definitions may be maintained on a central server on the Web, with the installed link pointing there, instead of containing only a snapshot at the time of link installation, which may rapidly go out of date. Alternatively, the currently highest rated link may be installed in the user""s Web document for each recognized topic.