1. Field of Invention
This invention relates to predicting the usage patterns of document collections given information about a user's information needs.
2. Description of Related Art
Increasingly, the World Wide Web has become the information delivery mechanism of choice for both corporations and individuals users. The ubiquity of World Wide Web browsers and the push by many corporations to adopt common off the shelf technology (COTS) have all helped the World Wide Web become a required delivery option for most information systems.
However, although information sources are now more likely to be available to their intended audience through the World Wide Web, the access to relevant information is still limited by a user's ability to navigate the World Wide Web and the destination web site and to actively accumulate the required information. Many sites use different methods or models of site design to present the information. For example, a web site designer of a county government tax assessors office site may assume any query will be related to county tax assessment. In contrast, the web site designer for an online department store needs to provide a user with access to product information ranging from toasters to jewelry. The web site designer of an internal corporate information site may need to provide access to corporate tax information, real estate holdings, business permits and/or health and safety records. Providing an intuitive interface to facilitate user access to information repositories including web sites is therefore increasingly important for businesses and consumers.
Accordingly, Web site designers, information system managers, and researchers are constantly developing new tools to gain understanding into the paths that users follow to obtain the information they need. For example, Web site designers, researchers and web site banner advertisers seeking to place information on the most relevant web site have used a variety of techniques to analyze web log files. Web log files contain information concerning which web page referred the user to the site as well as which web pages were visited within the site. Information concerning the user's IP address and browser type is also frequently saved for review in the web log file.
Tools such as Insight from Accrue Corporation, Astra Site Manager from Mercury Interactive and WebCriteria's Site Analysis, Task Analysis and MAX products allow Web site developers to analyze general statistics about a web site. For example, WebCriteria's Site Analysis product provides descriptive statistics accumulated through the use of the MAX software agent product. The MAX software agent traverses the web site to derive usability metrics from simulated browsing. However, the simulated browsing merely provides a random walk of a web site. Simulated browsing based on a random walk assumes the user's navigational choices at any juncture are random and simply ignores the presence of informational cues on each page and surrounding each link. However, in the actual use of the site, informational cues influence a user's decision as to whether one path through the Web site is chosen over another path.
In Chakrabarti et al. and Silva et al, (Chakrabarti, S., B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson and J. Kleinberg. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In Proc. of the 7th International World Wide Web Conference (WWW7), pp. 65-74, Brisbane, Australia, 1998, and Silva, I., B. Ribeiro-Neto, P. Calado, E Moura, N. Ziviani, Link-based and Content-Based Evidential Information in a Belief Network Model. In Proc. of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96-103, Athens, Greece 2000), a combination of keywords and links is used to determine a ranking weight for retrieval results. However, Chakrabarti and Silva make no attempt to predict the usage of a web site based on a virtual user's information needs.
Instead, these systems merely describe how users have traversed the web site in the past. These systems fail to provide web site designers an objective prediction, useful in describing how the changes to a document or web page affect the way a user with a specific information need will traverse the site. Co-pending Application, entitled “SYSTEM AND METHOD FOR PREDICTING WEB USER FLOW BY DETERMINING ASSOCIATION STRENGTH OF HYPERMEDIA LINKS”, by P. Pirolli et al., filed Mar. 31, 2000, and filed as U.S. application Ser. No. 09/540,976, incorporated in its entirety, predicts a user's traversal of the links in a document collection or web site using a computation based on the presence of information in the linked to or distal page. However, distal information is information which by definition has not yet been seen by the user. Accordingly, analysis of distal information cannot reflect an objective indication of the decisions made by an actual first time user of the document collection or web site, as the user encounters the navigational choices in the current or proximal document or web page. Instead, document collection or web site usability analysis requires including some measure of how the user's experience is affected by information cues in the proximal or current document or web page.