The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as Web pages. Users navigate these pages by means of computer software programs commonly known as Internet browsers. Due to the vast number of WWW sites, many Web pages have a redundancy of information or share a strong likeness in either function or title. The vastness of the unstructured WWW causes users to rely primarily on Internet search engines to retrieve information or to locate businesses. These search engines use various means to determine the relevance of a user-defined search to the information retrieved.
The present invention addresses the pervasive need to improve the hierarchical organization of Web sites. The Web sites organization may be quite different from the organization expected by visitors to the Web sites. Hence, it is often unclear under which branch of the hierarchical organization a specific document or page is located.
Several attempts have been made to address this need, exemplary of which are the following references:    M. Perkowitz et al., “Adaptive Web sites: Automatically synthesizing Web pages,” In Proc. of the Fifteenth National Conference on Artificial Intelligence (AAAI), 1998; and M. Perkowitz et al. “Towards adaptive sites: Conceptual framework and case study,” In Proc. of the Eighth Int'l World Wide Web Conference, Toronto, Canada, May 1999, investigate the problem of index page synthesis: the automatic creation of pages that facilitate a visitor's navigation of a Web site. After analyzing the web log, the publication describes a cluster mining algorithm that generates index pages which includes links to pages at the site relating to a particular topic. The publication also find collections of pages that tend to co-occur in visits and put them under one topic.    T. Nakayama et al., “Discovering the gap between Web site designers' expectations and users' behavior,” In Proc. of the Ninth Int'l World Wide Web Conference, Amsterdam, May 2000, also try to discover the gap between Web site designers' expectations and users' behavior. However the approach described in this publication uses inter-page conceptual relevance to estimate the former web site designers, and the inter-page access co-occurrence to estimate the latter users' behavior. This publication also focuses on Web site design improvement by using multiple regression to predict hyperlink traversal frequency from page layout features.    Spiliopoulou et al., “WUM: A web utilization miner,” In Proc. of EDBT Workshop WebDB98, Valencia, Spain, March 1998; and Spiliopoulou et al., “A data miner analyzing the navigational behaviour of web users,” In Proc. of the Workshop on Machine Learning in User Modeling of the ACA199, Greece, July 1999, propose a “web utilization miner (WUM)” to find interesting navigation patterns. The interestingness criteria for navigation patterns are dynamically specified by the human expert using WUM's mining language that supports the specification of criteria of statistical, structural and textual nature.    Chen et al., “Data mining for path traversal patterns in a web environment,” In Proceedings of the 16th International Conference on Distributed Computing Systems, pages 385–392, May 1996, present an algorithm for converting the original sequence of log data into a set of maximal forward references and filtering out the effect of some backward references that are mainly made for ease of traveling.    Pei et al., “Mining access patterns efficiently from web logs,” In Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 396–407, April 2000, propose a data structure, called Web access pattern tree for efficient mining of access patterns from pieces of logs.    Shahabi et al., “Knowledge discovery from users web-page navigation,” In Proc. of the 7th IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE),” pages 20–29, 1997, propose a method for capturing the client's selected links and pages order, page viewing time and cache references. The information is then utilized by a knowledge discovery technique to cluster users with similar interests.
However, none of these publications addresses the issues of users' expectations to find pages, and of discovering any mismatch between the site organization and users' expectations. It would therefore be desirable to provide a system and associated method for mining and using user access patterns, such as backtracks, to determine the most likely locations of Web pages, in order to improve the Web sites hierarchy and organization.