Many current data storage and retrieval systems are organized using a principle called "hypertext". As computer work stations and digital storage have grown cheaper, more powerful and more available, it has become increasingly more attractive to extend the traditional notion of "flat" text files that are organized hierarchically, by allowing more complex nonlinear organizations of material. In a hypertext system, each data entity, i.e. document or node, is connected to other documents in the system by pointers, or links. The human user of a hypertext system moves between documents by following the links. In this context, the process of moving between links is called "browsing."
Generally, hypertext database systems provide a mechanism to traverse from node to node using the links. As noted in J. Conklin, "Hypertext: An Introduction and Survey", COMPUTER, September 1987, pages 17-41, to qualify as hypertext, ideally a system should require a user to use no more than a few keystrokes or mouse operations to follow a single link. The links provided by the interface transport the user quickly and easily to a new place in the hypertext system.
Although hypertext systems presently enable a user to traverse efficiently between nodes using links, once he or she determines the desired links to be utilized, the number of documents in a hypertext system may be very large. Consequently, the number of links connected to any document may also become very large. This leads to difficulties in "navigating" through the database. The large number of links from each document often confuses the user when the user is attempting to select which link to follow.
One approach to this problem is providing an overview display or "map" of the hypertext documents and links. This approach has the disadvantage of creating a large and complex map display when the number of documents and links is large. As a result, further control and display options are needed, which the user must learn. Another disadvantage is that the user expends time in manipulating the map, rather than more effective use of the user's time, such as reading documents.
Another approach to this navigational dilemma is to apply standard database search and query techniques for locating documents that the user is seeking. This involves addressing entities by content. For example, entities are addressed using text or numbers that are stored in association with the entity, in addition to or rather than a user-assigned name or symbol. This is usually executed by applying some combination, using Boolean operations of keyword and full string search and predicates on other attributes (such as author, time of creation, type, etc.) of nodes or links.
Various languages exist for querying structured databases or text retrieval systems (for example, DIALOG, SQL). All of these languages share the drawbacks of being arbitrary and complex. These drawbacks cause problems in applications where untrained users must query a data storage system, or in educational and training uses, in which it is inappropriate to presume that users have prior training in the query method.
Further, textual query methods are subject to tradeoffs between precision (the number of retrieved entities which are actually interesting) and recall (the fraction of total interesting entities which are actually found). Studies have found that, for instance, a typical query to a legal information system produces only 20% of those database entities that are actually relevant. See D.C. Blair et al., "An Evaluation of Retrieval Effectiveness for a Full Text Document Retrieval System", Communications Of The ACM, March 1985, Vol.28, No.3, pp. 289-299.
Other attempts to control the complexity of linking have concentrated on database-wide elision of sets of links. For instance, the Intermedia system allows the separation of links into sets called webs. Only one of these sets is visible to the user at a time. This achieves simplification but at the expense of possibly removing valuable links from consideration if those links are stored in the webs which are not loaded. See N. Yankelovicli, et al., "Intermedia: The Concept and the Construction of a Seamless Information Environment", COMPUTER, January 1988, pp. 81-96.)
Another approach to elision is filtering. In this context, filtering refers to database-wide selection of documents and links based on a query, in a fashion similar to that described above. For example, see J. Remde et al., "Super Book: An automatic tool for information exploration-hypertext?", Bell Communications Research, Hypertext '87 Papers, November 1987, pp. 175-188; and "Searching for Information in a Hypertext Medical Handbook", Communications Of The ACM, July 1988, pp. 880-886. In such systems, the pattern of links is also considered in the decision to remove entities from the user's view. However, because such filtering methods treat the entire database at once, they share the limit of precision-recall tradeoff as described above, meaning that they achieve reduction of complexity at the expense of loss of information.
For example, suppose a user is a native speaker of German but also knows some English and French. In a filtering approach, the user might specify "German" as a filter. The database would filter out all documents not in German. The user would be unable to consider English or French documents even if such documents were highly relevant for other reasons.
U.S. Pat. No. 5,408,655 (Oren) provides a method for a user to rank the relevancy of each document and thereby reduce and order the choice of links which may be traversed from a particular node while browsing a hypertext. In Oren, a database of documents is indexed according to the content of the documents in the database. The index terms of Oren are content-based. Unlike the strategy of total elision of some classes of links, Oren leaves all links intact for potential use by either the user or the criterion evaluation process. However, this method depends on the user to reduce and order the links. The approach of Oren breaks down if the user's ranking results in a large number of similarly relevant documents.
With the growing use of multimedia databases containing not only textual documents, but also data entities containing sound and graphics, and the growing utilization of hypertext-type nodal networks within these multimedia databases, the requirement for effective and meaningful navigation has become even more imperative.
Utilization of a hypertext-type nodal network in conjunction with a multimedia database may be described as a "hypermedia database". Thus, in this context, the term "hypermedia system" refers broadly to a database which may be constructed to include documents or nodes and machine supported selected linkages or pointers that provide the user with the ability to efficiently travel from one node to another. These nodes may include text, sound, or graphic material. An example of a system that supports hypermedia is the World Wide Web (called, in shorthand, WWW, W3, or the Web). The Web is a system available using a global packet-switched network (the Internet) that allows traversal through a hypertext-type nodal network containing text, sound and graphics. The Web provides a machine-supported ability to selectively traverse in an automatic fashion using linkages. Items are selectively linked to each other in the nodal network. The set of all documents available using the World Wide Web is an example of a hypertext database.
The foregoing problems are acute in the context of the World Wide Web. Locating Web documents is a well-known problem. The Web is presently known to comprise millions of documents. In past approaches, Web documents have been located in two ways: by explicitly requesting a particular Web document using its uniform resource location (URL) identifier; or by submitting a query to a search engine. Several search engines are presently available, including Yahoo!, Excite, Lycos, InfoSeek, and AltaVista. In the search engines, the set of searchable Web documents is an example of a hypertext database.
To locate a Web document or site using a search engine, a user formulates a query using one or more keywords. The search engine has an internal index that indexes every significant word within all documents available to the search engine. Thus, the index is said to be a content-based index, because it is derived from the contents of the Web pages that are available to the search engine. The user provides a keyword query or a set of keywords to the search engine. When the search engine receives the user keyword query, the search engine looks up each keyword in the index, and assembles a list of documents that contain the keywords of the query.
In some search engines, the resulting list is presented to the user seemingly in random order. To locate a relevant document, the user must tediously traverse to or read each document in the list and determine whether it is relevant based upon its actual contents. In other search engines, such as AltaVista, the resulting list is presented to the user purportedly in order of the relevance of each document to the search query. In such search engines, the relevance of a document is determined using heuristic information, for example, by the number of times that a keyword appears in the content of the document, or by the number of all the keywords that appear in the content of the document. However, such heuristic information does not always accurately reflect the true relevance of a document to the user's query. The user is required to determine the relevancy of a document.
Another search method is filtering. The user specifies filtering parameters and the database is filtered based on the parameters to arrive at a set of relevant items. The user manually determines which item in the set of relevant items is the most relevant.
Based on the foregoing, there is a clear need in this field for a system and method for a hypertext system or hypermedia that can to reduce and order the set of relevant links. There is also a need for a system that can incorporate expert historical knowledge of past relevance to determine present relevance of documents.
Another need is to reduce the elapsed user time for traversing the database. Still another need is to allow the user to control the tradeoff between complexity and the number of intermediate links to the relevant documents.
Other needs and objects will become apparent from a consideration of the ensuing description and drawings.