With the acceptance of the World-Wide-Web (“the Web”) as a core business platform, many enterprises have moved beyond Web sites that offer little more than static brochureware to develop sophisticated Web based application and dynamically generated content. These businesses have invested heavily to create robust and dynamic e-commerce sites that link intranets, extranets, and the Internet as they use the Web as an important mechanism for customer relationship management. These businesses have moved into the world of e-business, a world that encompasses not only e-commerce, but includes internal applications that improve an enterprise's overall sales, marketing and support process.
With substantial dollar amounts being invested in on-line businesses, businesses demand thorough cost justification and careful allocation of resources. Many marketing managers, however, are unfamiliar with the Web as a marketing medium and are unprepared to face the complexity of the e-business environment. These managers need information to allow them to accurately gauge Web marketing performance, to make informed e-business decisions and strategically integrate new marketing initiatives, and to calculate a return on their Web investments.
One approach to Web marketing analysis is disclosed in PCT publication WO 98/38614 entitled “System and Method for Analyzing Remote Traffic Data in a Distributed Computing Environment” by Boyd et al. This system takes in traffic data hits (requests for resources, or page hits) as input, and builds results tables that include characteristic data of the traffic data hits. This data can then be made available for analysis.
Such site statistics can be helpful for some uses, but they provide little information to the marketer about who is coming to the Web site and how they are behaving while they are there. This later information is critical both for evaluating existing on-line marketing efforts and for integrating new behavior based on-line marketing initiatives, including one-to-one online marketing, specific content delivery, and incentives to encourage Web consumers to choose higher value paths through the Web site.
Generating the high-level user behavioral information necessary to visualize and act on user behavior is a challenging endeavor for at least two reasons. First, the data collected by database tools, such as the one described above, is at a very low level. Users (sometimes referred to as “visitors”) make one or more visits in a given time period with each visit comprising one or more page views. Information from Web server logs, network packet sniffers, and browser plug-ins (collectively referred to here as “Web logs”) includes only individual resource requests (hits) rather than page views, and timestamps and cookies (a physical view of user activity) rather than coherent visit and user information. This low level data can be refined, for example by (1) reducing raw hits to page views through exclusions (typically of images, robots, and other less interesting hits); (2) grouping related page views by the same user (identified by registration information, cookie, or other combination of identifying attributes) into visits (sometimes referred to as “sessions”) inferred by the proximity in time of these page views; and (3) storing the results in a database for later analysis. However, the database of page views, visits, and users is tied very firmly to the design and structure of the Web site being analyzed, and the pages on Web sites are generally defined to enable basic navigation and presentation of content to users—not to facilitate later analysis of user activity from a higher-level, logical view. As a result, providing marketers with the high level or logical view analysis of user behavior is difficult at best.
The second difficulty in using existing Web analysis tools to perform high level or logical view analysis of Web consumer behavior is that the sheer volume of data complicates analysis. There may be hundreds, thousands, or even larger numbers of pages on a site or interrelated collection of sites. In addition, both the actual pages on a site and the user population are constantly changing. Over time, the numbers of individual page views, visits and users are too large to extract meaningful patterns to analyze commonality and segment user behavior.
In order to characterize user behavior in meaningful and actionable ways, the analysis problems need to be reduced to manageable levels. It is essential to find a way to simplify the physical picture of user activity into a logical view, comprising groups of page views, visits, and users. The logical view can then be used for site optimization, personalized marketing, and customer relationship management.