In a stateliness hypertext server environment, such as a World Wide Web server in the Internet, hypertext objects are transferred between the server and clients via the network and Hypertext Transfer Protocol (HTTP). A client makes a request to a server for hypertext objects, usually through a browser which is a software tool running on the client's system; the server retrieves the requested objects and sends them through the network to the client. These hypertext objects are then displayed on the client's browser. HTTP is an example of a stateliness protocol. This means that every request from a client to a server is treated independently. After the server responds to the client's request, the connection between the client and the server is dropped. There is no record of prior activities from a given client address. The server treats every request as if it were brand-new, i.e., without context. Two advantages of using stateliness protocols are efficiency and simplicity.
Due to security concerns, a firewall, also called a proxy server, is typically employed between clients and the network which connects to a hypertext server where requested objects are stored. Client users access the hypertext objects of the hypertext server in the network through the proxy server. In so doing, the real client address is replaced with the proxy server's address before the requests for objects are sent to the hypertext server. Here, the real client identities are generally not available to the hypertext server.
With client identities usually masked by the proxy server, a client usually accesses the hypertext objects of a server in an anonymous way. However, such anonymity inhibits the analysis of aggregate user behavior, since the hypertext server cannot distinguish requests from different clients who access hypertext objects via the same proxy server. The hypertext server also cannot determine which group of objects are accessed together in a user session by an individual client. Hence, it becomes difficult to collect user-oriented hypertext object statistics. Understanding user-oriented object usage would provide many benefits, such as more effective marketing and better presentation of hypertext objects.
Current object usage statistics is typically limited to raw access counts. Simple raw access counts may substantially overstate the actual number of client accesses to a hypertext object, as the same user may repeatedly access to the same object by going back and forth through a hyperlink. The counting of repeated accesses by a single user to an object can lead to inaccurate conclusions in some cases.
A simple approach to grouping user accesses into user sessions is based on time stamps. For example, a user session could include all accesses within a predetermined interval. Unfortunately, this approach cannot distinguish two different client requests coming from the same proxy server within the specified time interval. Also, a single user session exceeding the predetermined interval will incorrectly be counted as two sessions.
Thus, there is a need for an improved method and system for analyzing user-oriented hypertext object usage. The present invention addresses such a need.
In order to improve performance, client and/or proxy caching are usually employed. With caching, hypertext objects are fetched locally instead of from the hypertext server. Thus, no requests are made to the server for the cached objects. There is also a need for a method and system for analyzing user-oriented hypertext object usage which accounts for client and/or proxy caching. The present invention addresses such a need.