1. Field of the Invention
The present invention relates in general to the field of information processing, and more specifically to acquiring specific data generated during Web site sessions using well-designed data recording components.
2. Description of the Related Art
The use of networks, particularly the Internet, by users continues to rise as technology and interesting destinations increase. World wide web (“Web”) sites continue to play an increasing role in the business models of many businesses. Sophisticated Web sites, such as many configuration Web sites, present a large number of navigation path options to a user, and, thus, a large amount of information about the user's activity and information presented to the user is potentially available for collection and analysis.
Information surrounding events of a user's Web site session can be very valuable information, particularly when attempting to understand the intent behind a user's actions. However, reliably capturing this knowledge and recording it in a useful and accessible format has proven elusive for conventional technology.
FIGS. 1 and 2 depict two Web systems and conventional attempts to capture session information. FIG. 1 depicts a static Web system 100. Web system 100 represents one of the original concepts for Web system and function. The Web site 108 includes a network of static hypertext markup language (“HTML”) pages 102(a)-102(d) linked together via hyperlinks. During a user's Web session, i.e. the user's activity on a Web site during a fixed time-frame, browsers 104(a)-104(c) interact with the Internet information services (“IIS”) Web server 106 over a network, such as the Internet or an intranet, to access static content. Note, Internet Explorer browsers and IIS Web server software are available from Microsoft Corporation of Washington and Netscape Navigator browsers are available from Netscape Communication Corporation. Such interaction works as follows. Each individual browser 104(a)-104(c) makes requests of specific, static HTML pages selected from HTML pages 102(a)-102(d). The Web server 106 receives these requests, locates the corresponding HTML page and sends back that HTML page to the requesting browser as a response. In essence, the Web server 106 functions as a warehouse for HTML pages 102(a)-102(d), with the ability to handle multiple requests for multiple pages at the same time.
The content of the HTML pages 102(a)-102(d) is not dynamic, i.e. the content of any page does not change from response to response. Hyperlinks on a particular page request other static pages when clicked on by the user, allowing the user to navigate the Web site 108.
IIS Web server 106 log records capture the request information from browsers 104(a)-104(c). However, the content of the responses is not logged. Generally this is unnecessary as the content does not change from response to response, so recording this information would not add useful information into the log files.
Thus, by recording each page accessed by a particular browser, a user's session could be recreated entirely, provided that an archival record is made of the content of each accessed page. For Web sites, such as an automotive or computer configuration Web site, creating the number of pages necessary to represent all possible configurations would require an enormous amount of memory. Additionally, any modifications to configuration options would typically require an enormous of amount of work to update or replace old pages.
FIG. 2 depicts a dynamic content generating Web system 200, which essentially makes interactive applications (such as a configurator, or a online retail site) available via the Web. Dynamic content allows a Web page to display current products in a user's shopping cart and display a different list depending on what items the user is considering for purchase. Similarly, for an auto configuration site, the available colors displayed on an exterior colors Web page depend on all the other choices the user has made before viewing the colors page.
Dynamic Web site 204 stores a minimal amount of static HTML pages. The vast majority of Web pages are generated using a much smaller number of dynamic content pages 210, such as Java Server Pages™ (“JSP”). JSP is an alternative Java™ based format engineered to manage dynamic content on the Web. Many versions of JSP supports the development of reusable modules often referred to as “actions” or “functions”. A tag in the JSP page invokes a referenced tag function. (“Java Server Pages” and “Java” are trademarks of Sun Microsystems of Santa Clara, Calif.). When a request for one of the Dynamic content pages 210 arrives from any of browsers 206(a)-206(c), the Web server 202 forwards the request to the servlet runner application 208, such as “JRun” by Macromedia, Inc. of San Francisco, Calif. Servlet runner application 208 interprets the code on the requested JSP page along with a number of variables stored for each active user. The variables typically include data such as lists of items in a shopping cart and the parts chosen in an active configuration. Using the requested JSP page and the variables, the servlet runner application 208 generates an HTML page 212. Clearly the content of this html file is dynamic, changing with every request for the JSP page in question. The servlet runner application 208 passes the HTML page 212 to the IIS Web server 202. The IIS Web server 202 returns the HTML page 212 to the requesting browser. The content of this HTML page 212 is dynamic, changing with every browser request for a particular one of the JSP pages 210.
Note that from the perspective of the IIS Web server 202, for purposes of this discussion there is virtually no difference between a browser request for an HTML page and a JSP page. In both cases the response to a request is an HTML page. The IIS Web server 202 logs still only record the requests made to the IIS Web Server 202. The logs do not contain any information about the content of the responses. For example, in a server-side configuration or pricing application may generate data used to populate the HTML page 212. This server-side generated data provides values for many of the variables that are not recorded in the IIS Web Server 202 logs. In the case of static HTML pages 102(a)-102(d), this was not an issue because of the persistence of every HTML page. In the case of dynamic pages, much of the information contained in HTML page 212 is not recorded in the logs. Such information includes many of the details that are desirable to track such as configuration selection details, dealer search details, vehicle locate details, customer demographics, etc. For example, using browser 206(a), a user selects an exterior color on an automotive configuration Web site. While the IIS Web server 2002 log may reveal that a vehicle was configured, or that a particular exterior color was selected, the log would not indicate that the choice of exterior color also resulted in a change of interior color because this information is not included in the server response to the browser. Similarly, consider the case of a lead sent to a dealer. While the IIS log would indicate a request for the lead submission page, it would contain no information about the details of the lead because this information is not communicated to the browser.
Referring to FIG. 3, software layer architecture 300 of server 302 contains data recording hooks 304-308 distributed throughout various representative layers of the software layer architecture 300. Presentation layer 310 contains the software components that dictate elements of a user interface to be presented to a Web user. The JSP tag layer 312 contains software components primarily provide the content of fields in a Web page to be presented to a Web user. The server application layer 314 contains software components that process information received from a user and provide data to be inserted into a user interface by the tag layer 312 and presentation layer 310.
The data recording hooks 304-308 record data generated by the various layers of the software layer architecture 300. The data recording hooks 304-308 allow a Web site utilizing software layer architecture 300 to selectively record data that is not necessarily present in the user request and Web server response or other conventional data stream exchanged between a Web browser and a Web server.
Conventional data recording hook design and design methodology resulted in many difficulties. For example, the distribution of data recording hooks 304-308 as depicted in FIG. 3 presented multiple maintainability issues. Conventional thought was initially focused around the concept of having data recording hooks to augment conventional Web site data recording systems rather than on a design methodology for data recording hooks. As a result, many data recording hooks were either arbitrarily placed in various layers or placed in what appeared to be an advantageous layer but resulted in maintainability issues. For example, data is directly presented to the user through the presentation layer 310, and the user directly manipulates data at the presentation layer 310 level. Thus, the presentation layer 310 appears to be an ideal place from which to record data exchanged between the user and Web server. However, the presentation layer 310 is typically the layer that most often changes as Web sites update look-and-feel, simple functionality, and other visible and behavioral aspects. As a result, following updates, data recording hooks 304 could be deleted, become nonfunctional, or begin to collect incorrect or misleading data unless an extraordinary amount of care was taken during updates of presentation layer 310. Additionally, locating hooks in the presentation layer 310 and other higher level layers often necessitated the use of additional hooks because of the multiple paths through which a user can access essentially the same Web page. Accordingly, the value of data recording hooks 304 began to be offset with the extra effort needed to maintain data recording hooks 304.
The same types of maintainability issues also plagued data recording hooks 306 and 308. Although placed deeper in the software layer architecture 300, data recording hooks 306 and 308 were also not primarily associated with code portions that were substantially content stable over time. As a result, updates to deeper code layers raised many of the same maintainability issues associated with data recording hooks 304 in the presentation layer 310.