The present invention relates to the field of analysis and design of hypermedia linked collections of documents, and in particular to the prediction of user traffic flow in such a collection without relying on observed usage information.
The users of hypertext linked documents such as the World Wide Web, typically forage for information by navigating from document to document by selecting hypertext links. A piece of information such as a snippet of text is typically associated with each hypertext link. The snippet of text provides the user with information about the content of the document at the other end of the link. When the link leads the user to a document that is relevant to his information need, the user comes closer to satisfying his information need, thus reducing the amount of time that he will continue to forage for information. However, if the link leads the user to a document that is not relevant, then the user will continue foraging for information.
The structural linkage topology of collections of hypermedia linked documents is similar to a highway system. In a highway system, a traveler begins at some origin point and travels along the roads of the highway system in order to reach a desired destination. Along the way, the traveler may see signs that indicate which roads he should take to reach his desired destination. For example, a traveler who wishes to go from his home to the local airport might travel along the roadways until seeing a sign with the words xe2x80x9cinternational airportxe2x80x9d or a sign with a picture of an airplane. Either sign could give traveler information about which highway ramp to take in order to reach the airport. If the signs do not exist or if they are confusing, the traveler would probably not be able to find his destination.
Similarly, a user on the Web might start from one web page and select links based on whether they look like they might lead the user to another web page that might satisfy his information need. The links are analogous to roadways that can take the user to his destination, the information need. How well these links will lead users to their desired destinations depends on a complex interaction of user goals, user behaviors, and Web site designs.
Designers and researchers who want to know how users will interact with the Web develop hypotheses about these complex interactions. In order to evaluate these hypotheses rapidly and efficiently, tools need to be created to deal with the complexity of these interactions. Existing approaches to evaluate these hypotheses include extracting information from usage data such as Web log files, and applying metrics such as the number of unique users, the number of page visits, reading times, session links, and user paths. The degree of reliability of these approaches varies widely based upon the different heuristics used. For example, most existing Web log file analysis programs provide little insight into user Web interactions because they merely provide simple descriptive statistics on where users have been.
One shortcoming of existing approaches is that they require collecting past user behavior in order to perform the prediction. Another shortcoming of existing approaches is that they do not analyze the content contained in the hyperlinked documents. Thus, there is a need for a system and method for predicting user traffic flow in a collection of hypermedia linked documents that does not require collecting user interaction information in order to perform the prediction, and which also takes into account the content of the documents.
An embodiment of the present invention provides a system and method for predicting user traffic flow in a collection of hypermedia documents by determining the association strength of hypermedia links. Conceptually, the association strength is a measure of the probability that a user will flow down a particular hypermedia link. The system and method of the present invention do not require collecting user interaction information in order to perform the prediction, because they take into account the content of the documents. An embodiment of the present invention includes a system and method for determining the association strength of hypermedia links in a document collection based on the user information need and content items that are contained in the documents. The system identifies the hypermedia linkage structure among the plurality of documents in the collection, where the documents include content items that may be relevant to a user information need. The system determines the distribution of the content items in the document collection. The system receives an information item as input and compares the information item to the content items. In response to the comparison, the system assigns an association strength to the hypermedia links. The system also uses a network flow model that predicts user traffic flow using the association strengths of the hypermedia links and applying them to an initial condition.