The present invention relates generally to Web sites and, more particularly, to analyzing Web site visitor activities.
The Internet has gained broad recognition and acceptance as a viable medium for communicating and for conducting business. The World-Wide Web (Web) was created in the early 1990""s, and is comprised of server-hosting computers (Web servers) connected to the Internet that have hypertext documents (referred to as Web pages) stored therewithin. Web pages are accessible by client programs (e.g., Web browsers) utilizing the Hypertext Transfer Protocol (HTTP) via a Transmission Control Protocol/Internet Protocol (TCP/IP) connection between a client-hosting device and a server-hosting device. While HTTP and Web pages are the prevalent forms for the Web, the Web itself refers to a wide range of protocols including Secure Hypertext Transfer Protocol (HTTPS), File Transfer Protocol (FTP), and Gopher, and Web content formats including plain text, HyperText Markup Language (HTML), Extensible Markup Language (XML), as well as image formats such as Graphics Interchange Format (GIF) and Joint Photographic Experts Group (JPEG).
A Web site is conventionally a related collection of Web files that includes a beginning file called a xe2x80x9chomexe2x80x9d page. From the home page, a visitor can access other files and applications at a Web site. A large Web site may utilize a number of servers, which may or may not be different and may or may not be geographically-dispersed. For example, the Web site of the International Business Machines Corporation (www.ibm.com) consists of thousands of Web pages and files spread out over multiple Web servers in locations world-wide.
A Web server (also referred to as an HTTP server) is a computer program that utilizes HTTP to serve files that form Web pages to Web clients. Exemplary Web servers are International Business Machines Corporation""s family of Lotus Domino(copyright) servers and the Apache server (available from www.apache.org). A Web client is a requesting program that also utilizes HTTP. A browser is an exemplary Web client for use in requesting Web pages and files from Web servers. A Web server waits for a Web client, such as a browser, to open a connection and to request a specific web page or application. The Web server then sends a copy of the requested item to the Web client, closes the connection with the Web client, and waits for the next connection.
HTTP allows a browser to request a specific item, which a Web server then returns and the browser renders. To ensure that browsers and Web servers can interoperate unambiguously, HTTP defines the exact format of requests (HTTP requests) sent from a browser to a Web server as well as the format of responses (HTTP responses) that the Web server returns to the browser. Exemplary browsers include Netscape Navigator(copyright) (America Online, Inc., Dulles, Va.) and Internet Explorer(copyright) (Microsoft Corporation, Redmond, Wash.). Browsers typically provide a graphical user interface for retrieving and viewing Web pages, applications, and other resources served by Web servers.
The topology of many Web sites is becoming complex, especially Web sites involved in electronic commerce (xe2x80x9ce-commercexe2x80x9d). Increasingly, Web sites are utilizing Web application servers and xe2x80x9cback-endxe2x80x9d servers to process Web client requests. A Web application server is a xe2x80x9cmiddlemanxe2x80x9d server that operates between a Web server and one or more back-end servers, such as a database, transaction, or advertising server. An exemplary Web application server is the WebSphere(copyright) application server available from International Business Machines Corporation, Armonk, N.Y. Exemplary back-end servers include CICS, IBM DB2 Universal Database, and WebSphere Net.Commerce server, available from International Business Machines Corporation, Armonk, N.Y.
FIG. 1 illustrates a conventional Web site 10 having multiple Web servers 12a, 12b and 12c, and an application server 14. The illustrated Web site 10 also includes an Advertising server 16 and a Transaction server 18 which are back-end processing servers. As is understood by those of skill in the art, each Web server 12a, 12b, 12c is configured to receive client requests and issue responses thereto. Each Web server 12a, 12b, 12c is also configured to route requests to an Application server 14 for additional processing if necessary.
As illustrated in FIG. 1, Web servers 12a, 12b, 12c include respective log files 13a, 13b, 13c. In addition, the Application server 14, the Advertising server 16, and the Transaction server 18 each have respective.log files 15a, 15b, 15c as illustrated. As is known by those of skill in the art, a server log file may be utilized to store a record containing information about each transaction (i.e., requests and responses) handled by a respective server. For example, the log file 13a for Web server 12a may store a record for each request received from a client and each response issued to a client. The Application server 14 may store a record in its log file 15a for each request received from a Web server and stores a record for each response issued to a Web server. Similarly, the Advertising server and Transaction server may store records in their respective log files 15b, 15c for responses and requests to and from the Application server 14 or to and from another server or to a client.
Information contained within server log records is conventionally utilized to study the activities of Web site visitors (referred to a xe2x80x9cpsychographicxe2x80x9d information). Analyzing and understanding Web site visitor psychographic information is becoming increasingly important to businesses operating on the Web. For example, the following psychographic information can be valuable: visitor identification, time of day a visitor accessed a Web site, an identification of each Web page requested by a visitor, how long a visitor spent viewing each Web page, and where the visitor came from (i.e., a referring URL). By analyzing Web site visitor psychographic information, a Web site owner can gain valuable insight into the effectiveness of a Web site in achieving its intended purpose. For example, a determination can be made as to what type of advertising should be placed on web pages that a particular type of visitor is likely to visit.
To accurately analyze psychographic information, it is desirable to be able to piece together a visitor""s actions as a stream of requests into a unit called a session. A session starts the first time a visitor comes to a Web site and ends with an explicit log-out or an idle-time expiration. Unfortunately, analyzing psychographic information for Web site visitors can be difficult for complex Web sites, especially e-commerce Web sites that utilize multiple, different servers. A session can be difficult to accurately reconstruct from the records contained within multiple servers. This is because server log records for servers xe2x80x9cdownstreamxe2x80x9d from a Web server that has an established connection with a visitor""s client typically are not linked to a visitor.
For example, a visitor of the Web site illustrated in FIG. 2 might submit a form to the first Web server 12a with a request that some action be performed on or with the data within the form. The first Web server 12a records the visitor""s request in its log 13a and passes the data within the form to the Application server 14. The Application server 14 could invoke a servlet or other program that updates information in a database via the Transaction server 18. Under this scenario, the Application server 14 could record the invoked servlet (or other program) request from the first Web server 12a in its log 15a, and the Transaction server 18 could log the update to the database in its log 15c. In addition, the Advertising server 16 could be invoked by the Application server 14 to send coupons directly to the client 20. Under this scenario, the Advertising server could record in its log file 15b that coupons were sent to the client. Unfortunately, under existing server logging techniques, the log records of the Application server 14, the Transaction server 18 and the Advertising server 16 are typically not linked to the visitor who submitted the form, which is being logged by the Web server 12a. 
In view of the above discussion, it is an object of the present invention to provide systems, methods and computer program products that may link all server transactions initiated by a Web site visitor to the visitor.
It is another object of the present invention to enhance psychographic analysis by enabling a session to be accurately reconstructed.
It is another object of the present invention to facilitate linking spare records among independent processes (servers).
These and other objects of the present invention are provided by systems, methods, and computer program products for linking a plurality of transactions performed by a plurality of servers at a Web site to a Web site visitor who initiates or is associated with the transactions. According to an embodiment of the present invention, when a Web server receives a client request from a Web site visitor for some action to be performed, the Web server stores a record of the client request in a log file and attaches a unique record identification (RID) to the record. The record stored in the Web server log includes information about the Web site visitor.
If the invocation of an application is required to respond to the client request, the Web server passes the unique RID to an Application server at the Web site along with the client request. The Application server stores a record of the Application server request in a log file associated with the Application server and attaches the RID to the record. The Application server may invoke one or more back-end servers to perform processing to satisfy the client request. The Application server passes the unique RID to a back-end server along with a request for the back-end server to perform an action. The back-end server stores a record of the Application server request in a log file associated with the back-end server and attaches the RID to the record.
To assist in the analysis of visitor activities at a Web site, the information stored within log records of the various servers having the same, unique RID can be combined. This combined information, thus, may represent the entirety of a visitor""s activities at a Web site (i.e., a session) from the time of receipt of a client request to explicit log-out or idle time expiration.