1. Field of the Invention
The present invention relates in general to the field of information processing, and more specifically to generating contextual user network session history in a dynamic content environment using a session recording and parsing system.
2. Description of the Related Art
The use of networks, particularly the Internet, by users continues to rise as technology and interesting destinations increase. World wide web (“Web”) sites continue to play an increasing role in the business models of many businesses. Sophisticated Web sites, such as many configuration Web sites, present a large number of navigation path options to a user, and, thus, a large amount of information about the user's and information presented to the user is potentially available for collection and analysis.
Historical information surrounding a user's Web site session can be very valuable information, particularly when attempting to understand the context of a user's Web session. However, reliably capturing this knowledge and recording it in a useful and accessible format has proven elusive for conventional technology.
FIGS. 1 and 2 depict two Web systems and conventional attempts to capture session information. FIG. 1 depicts a static Web system 100. Web system 100 represents one of the original concepts for Web system and function. The Web site 108 includes a network of static hypertext markup language (“HTML”) pages 102(a)-102(d) linked together via hyperlinks. During a user's Web session, i.e. the user's activity on a Web site during a fixed time-frame, browsers 104(a)-104(c) interact with the Internet information services (“IIS”) Web server 106 over a network, such as the Internet or an intranet, to access static content. Note, Internet Explorer browsers and IIS Web server software are available from Microsoft Corporation of Washington and Netscape Navigator browsers are available from Netscape Communication Corporation. Such interaction works as follows. Each individual browser 104(a)-104(c) makes requests of specific, static HTML pages selected from HTML pages 102(a)-102(d). The Web server 106 receives these requests, locates the corresponding HTML page and sends back that HTML page to the requesting browser as a response. In essence, the Web server 106 functions as a warehouse for HTML pages 102(a)-102(d), with the ability to handle multiple requests for multiple pages at the same time.
The content of the HTML pages 102(a)-102(d) is not dynamic, i.e. the content of any page does not change from response to response. Hyperlinks on a particular page request other static pages when clicked on by the user, allowing the user to navigate the Web site 108.
IIS Web server 106 log records capture the request information from browsers 104(a)-104(c). However, the content of the responses is not logged. Generally this is unnecessary as the content does not change from response to response, so recording this information would not add useful information into the log files.
Thus, by recording each page accessed by a particular browser, a user's session could be recreated entirely, provided that an archival record is made of the content of each accessed page. For web sites, such as an automotive or computer configuration web site, creating the number of pages necessary to represent all possible configurations would require an enormous amount of memory. Additionally, any modifications to configuration options would typically require an enormous of amount of work to update or replace old pages.
FIG. 2 depicts a dynamic content generating Web system 200, which essentially makes interactive applications (such as a configurator, or a online retail site) available via the Web. Dynamic content allows a Web page to display current products in a user's shopping cart and display a different list depending on what items the user is considering for purchase. Similarly, for an auto configuration site, the available colors displayed on an exterior colors Web page depend on all the other choices the user has made before viewing the colors page.
Dynamic Web site 204 stores a minimal amount of static HTML pages. The vast majority of Web pages are generated using a much smaller number of dynamic content pages 210, such as Java Server Pages™ (“JSP”). JSP is an alternative Java™ based format engineered to manage dynamic content on the Web. Many versions of JSP supports the development of reusable modules often referred to as “actions” or “functions”. A tag in the JSP page invokes a referenced tag function. (“Java Server Pages” and “Java” are trademarks of Sun Microsystems of Santa Clara, Calif.) When a request for one of the Dynamic content pages 210 arrives from any of browsers 206(a)-206(c), the Web server 202 forwards the request to the servlet runner application 208, such as “JRun” by Macromedia, Inc. of San Francisco, Calif. Servlet runner application 208 interprets the code on the requested JSP page along with a number of variables stored for each active user. The variables typically include data such as lists of items in a shopping cart and the parts chosen in an active configuration. Using the requested JSP page and the variables, the servlet runner application 208 generates an HTML page 212. Clearly the content of this html file is dynamic, changing with every request for the JSP page in question. The servlet runner application 208 passes the HTML page 212 to the IIS Web server 202. The IIS Web server 202 returns the HTML page 212 to the requesting browser. The content of this HTML page 212 is dynamic, changing with every browser request for a particular one of the JSP pages 212.
Note that from the perspective of the IIS Web server 202, for purposes of this discussion there is virtually no difference between a browser request for an HTML page and a JSP page. In both cases the response to a request is an HTML page. The IIS Web server 202 logs still only record the requests made to the IIS Web Server 202. The logs do not contain any information about the content of the responses. For example, in a server-side configuration or pricing application may generate data used to populate the HTML page 212. This server-side generated data provides values for many of the variables that are not recorded in the IIS Web Server 202 logs. In the case of static HTML pages 102(a)-102(d), this was not an issue because of the persistence of every HTML page. In the case of dynamic pages, much of the information contained in HTML page 212 is not recorded in the logs. Such information includes many of the details that are desirable to track such as configuration selection details, dealer search details, vehicle locate details, customer demographics, etc. For example, using browser 206(a), a user selects an exterior color on an automotive configuration Web site. While the IIS Web server 2002 log may reveal that a vehicle was configured, or that a particular exterior color was selected, the log would not indicate that the choice of exterior color also resulted in a change of interior color because this information is not included in the server response to the browser. Similarly, consider the case of a lead sent to a dealer. While the IIS log would indicate a request for the lead submission page, it would contain no information about the details of the lead because this information is not communicated to the browser.