1. Field of the Invention
The present invention relates to navigation and searching in documents which are connected by links and are most commonly referred to as hypertext documents.
2. Description of the Prior Art
Documents connected by links are usually referred to as hypertext documents. One example of a hypertext document format that is commonplace and widely used on the Internet is the hypertext markup language (HTML) format. Another example of a hypertext document format is that used by the help files contained in the graphical user interface “WINDOWS” sold by the Microsoft Corporation. The term “HTML pages” used below should be construed as encompassing all forms of hypertext documents.
Although searching for information by navigating through the hyperlinks of hypertext documents represents a vast improvement over searching through traditional documents with their hierarchical chapter structure, additional aids to searching and navigating are necessary. These aids include an index which links the search words to their corresponding pages.
Other known aids include “search engines”. These are queried using one or more relevant words which are applied to a precompiled and continuously updated index which is usually very comprehensive but not directly visible. The search engines then display links to a number of documents in which these relevant words are mentioned.
A number of options are available for compiling this index from HTML documents. The index may be compiled from: 1) relevant words using the META tag, 2) text contents of other tags, in particular, the “TITLE” tag, or 3) the contents of the entire text. The option chosen depends primarily upon the amount of data to be indexed in relation to the operating resources available.
For search engines, the correct choice of search words is of crucial importance for a good search result, but search engines do not take into account nor represent the relationship of relevant documents. Thus, during a search, an individual may find a hypertext page that is reasonably close to containing the desired information, yet that individual will then have to systematically search back and forth through the links and manually inspect the hypertext pages in order to find the desired information.
A tree is used to represent the basic structure of hypertext pages because each page appears as nodes with links to subordinate nodes, although back links and cross links critically interfere with this structure. One navigation aid, known as a site map, displays a structure tree of the hypertext documents. The site map starts with a reference page, which is usually referred to as the home page, and constructs tree roots—the site map suppressing (or displaying as unhighlighted) all links conflicting with the tree structure. A number of other two-dimensional graphical representation forms are known, and, more recently, three-dimensional images have been utilized, which the user can interactively rotate and project onto a two-dimensional display surface. Yet a disadvantage with these representations is that they are only labeled with a short text string, usually the defined title. Although this navigation is clearer than if the user constructs this tree in his memory or writes it down on paper, the user nevertheless still has no help as to which of the nodes might have the greatest relevance. Furthermore, the usefulness of a search engine and associated index is contingent on the appropriate choice of relevant words to be searched.