The users of hypertext linked documents such as the World Wide Web, typically forage for information by navigating from document to document by selecting hypertext links. A piece of information such as a snippet of text is typically associated with each hypertext link. The snippet of text provides the user with information about the content of the document at the other end of the link. When the link leads the user to a document that is relevant to his information need, the user comes closer to satisfying his information need, thus reducing the amount of time that he will continue to forage for information. However, if the link leads the user to a document that is not relevant, then the user will continue foraging for information.
The structural linkage topology of collections of hypermedia linked documents is similar to a highway system. In a highway system, a traveler begins at some origin point and travels along the roads of the highway system in order to reach a desired destination. Along the way, the traveler may see signs that indicate which roads he should take to reach his desired destination. For example, a traveler who wishes to go from his home to the local airport might travel along the roadways until seeing a sign with the words “international airport” or a sign with a picture of an airplane. Either sign could give traveler information about which highway ramp to take in order to reach the airport. If the signs do not exist or if they are confusing, the traveler would probably not be able to find his destination.
Similarly, a user on the Web might start from one web page and select links based on whether they look like they might lead the user to another web page that might satisfy his information need. The links are analogous to roadways that can take the user to his destination, the information need. How well these links will lead users to their desired destinations depends on a complex interaction of user goals, user behaviors, and Web site designs.
Designers and researchers who want to know how users will interact with the Web develop hypotheses about these complex interactions. In order to evaluate these hypotheses rapidly and efficiently, tools need to be created to deal with the complexity of these interactions. Existing approaches to evaluate these hypotheses include extracting information from usage data such as Web log files, and applying metrics such as the number of unique users, the number of page visits, reading times, session links, and user paths. The degree of reliability of these approaches varies widely based upon the different heuristics used. For example, most existing Web log file analysis programs provide little insight into user Web interactions because they merely provide simple descriptive statistics on where users have been.
One shortcoming of existing approaches is that they focus on the destination of the user's visit, and not on the user's true information goal. Thus, there is a need for a system and method for inferring user information need in a hypermedia linked document collection.