Communication of data over computer networks, particularly the Internet, has become an important, if not essential, way for many organizations and individuals to disseminate information. The Internet is a global network connecting millions of computers using a client-server architecture in which any computer connected to the Internet can potentially receive data from and send data to any other computer connected to the Internet. The Internet provides a variety of methods in which to communicate data, one of the most ubiquitous of which is the World Wide Web. Other methods for communicating data over the Internet include e-mail, usenet newsgroups, telnet and FTP.
The World Wide Web is a system of Internet servers, typically called “web servers”, that support documents formatted according to the hypertext markup language (“HTML”). These documents, known as web pages, are transferred across the Internet according to the Hypertext Transfer Protocol (“HTTP”). Web pages are often organized into web sites that represent a site or location on the World Wide Web. The web pages within a web site can link to one or more web pages (or files) at the same web site or at other web sites. A user can access web pages using a browser program and can “click on” links in the web pages being viewed to access other web pages. Each time the user clicks on a link, the browser program generates an HTTP request and communicates it to web server hosting the web page. The web server retrieves the requested web page and returns the web page to the browser program. The returned web page can provide a variety of content, including text, graphics, audio and video content.
Because web pages can display content and receive information from users, web sites have become popular for enabling commercial transactions. As web sites become more important to commerce, businesses are increasingly interested in monitoring how users navigate their web sites. One way to do this is to record and analyze all the HTTP requests made by a user to the web site. This is often called “click stream analysis”. An entity controlling a web site can review the paths users took through its web site to try to determine if usage patterns exist.
Current click stream analysis systems, however, typically provide very limited information about a user's browsing habits. This is because they only provide a record of HTTP requests, but do not link the requests to specific content in the web page or events occurring in the page such as the presentation of content from an ad server. Thus, while current click stream analysis systems provide information as to how a user navigated a web site, they provide little or no information as to why the user navigated the web site in that manner. In other words, current click stream analysis systems focus only on user behavior but not the content that drives that behavior. Furthermore, current click stream analysis systems do no link events occurring at back-end systems with the page requests of particular users. Therefore, a user's behavior can not be analyzed in terms of a business process.
FIG. 1 illustrates the deficiencies of current click stream analysis systems. In FIG. 1, a client computer 5, through an Internet browser, makes an HTTP request to a web server 10 over the Internet 15. If the requested web page includes dynamic content, the web server 10 can initiate a script, using, for example, the common gateway interface (“CGI”) mechanism, to send data to an application server 20 to generate the dynamic content. Application server 20 can generate dynamic HTML content according to a programming language such as C or PERL and return the contents to web server 10. Web server 10 can, in turn, communicate the HTML content back to the client computer 5 as the requested web page.
In current systems, web server 10 can keep a file 25, known as a web log, of HTTP requests. By associating the HTTP request with a user, current click stream analysis systems can analyze the user's path through the web site hosted by web server 10. However, since the web log only records user requests at web server 10, analysis of the web log provides no insight into the events that occurred at application server 20 in response to a particular request. Thus, while click stream analysis may allow for review of the pages requested by a user, it does not provide any knowledge as to the dynamic content actually presented to the user by application server 20.