Users of large linked collections of documents, for instance as manifest on the World Wide Web, are motivated to improve the rate at which they gain information needed to accomplish their goals. Hypertext structures primarily affords information seeking by the sluggish process of browsing from one document to another along hypertext links. This sluggishness can be at least partly attributed to three sources of inefficiency in the basic process. First, basic hypertext browsing entails slow sequential search by a user through a document collection. Second, important information about the kinds of documents and content contained in the total collection cannot be immediately and simultaneously obtained by the user in order to assess the global nature of the collection or to aid in decisions about what documents to pursue. Third, the order of encounter with documents in basic browsing is not optimized to satisfy users' information needs. In addition to exacerbating difficulties in simple information-seeking, these problems may also be found in the production and maintenance of large hypertext collections.
There are two widely visible technologies that may be considered broadly as seeking to address the above inefficiencies:
Text-based information retrieval techniques that rapidly evaluate the predicted relevance of documents to a user's topical query (e.g. services such as Alta Vista.TM., LycoS.TM., and Infoseek.RTM. which operate on the World Wide Web). This effectively changes slow sequential search to nearly parallel search, and provides an improved ordering of the users' search through documents. PA1 Community/service categorization of documents. For instance, this service is provided by Yahoo.TM., which has a hierarchy of Web pages that define a topic taxonomy.
Known previous work has focused on attempts to extract higher level abstractions which can be used to improve navigation and assimilation of hypertext. Such work has typically used topological or textual relationships to drive analysis.