Aspects of the present invention relate to the World Wide Web. Other aspects of the present invention relate to monitoring web site visitors.
With the rapid advancement of the Internet, more and more companies develop web sites to advertise and sell their products. With increasing demand for web sites and for their maintenance arises, various services have emerged and continue to emerge to meet this increasing demand. For example, online services or OS provide web hosting services to companies that rely on third parties to develop and to maintain their web sites. As part of such services, OS often offers web site analysis and develops detailed traffic statistics on a customer's web site. For instance, visitors may be recorded and their browsing patterns may be analyzed. Reports about the characteristics of the visitors to a web site as well as their behaviors can be generated as part of the OS service product. Such reports may later be used to understand the effectiveness of a web site, to identify potential customers of different products, as well as to gather information that is useful to generate personalized profiles for individual customers.
Cookies have been used to differentiate visitors to a web site. Since cookies ties a user to an individual login, it serves as an accurate method to keep track of visitors. But, cookies may not be enabled at certain web sites or the browser at a client site may not permit their use. In this case, the Internet Protocol (IP) address of a client is often used to identify a visitor. This method may work well only when the customer's IP address is sent along with the HTTP request to the web server. However, many visitors, if not most nowadays, access the Internet from behind a proxy server which allows multiple users behind a firewall to share gateways to the Internet. When a client browses a web site through a proxy server, the IP address used to communicate with the web server that hosts the web site is the IP address of the proxy server. In this case, the client's IP address is hidden behind the proxy server. Therefore, the recorded hit (to the web site) based on the IP address does not correspond to the ultimate user, but rather to the proxy server only.
FIG. 1 depicts a mechanism in which a web server records hits based on the Internet Protocol addresses of the proxy servers through which clients send browsing requests, thus, it illustrates a scenario. A client site 110 includes at least one client (client 1 110a, client 2 110b, . . . , client n 110c) and connects to one or more proxy servers (120a, . . . , 120b) in a proxy server group 120. The client site 110 communicates with a web server 150 through a network 130 to browse a web site hosted at the web server 150. Each of the proxy servers in the proxy server group 120 has a distinct IP address that is reachable on the Internet. The web server 150 comprises web pages 150a, an IP address identification mechanism 150b, and visitor statistics storage 150c. 
When a client (e.g., client 1 110a) sends a browsing request 125 (e.g., a URL address for a web page) to the web server 150, a proxy server (e.g., proxy server 120a) forwards the browsing request 125 using its public IP address (i.e., IP address 1) as the return address. When the web server 150 receives the browsing request 125, it retrieves the requested web page and returns it to the given return address or IP address 1 of the proxy server 1. At the same time, the IP address identification mechanism 150b records a hit from the IP address 1 and stores the information relevant to the hit in the visitor statistics storage 150c. When the proxy server 1 receives the requested web page, it forwards the page to the client 1. During the process of browsing the requested web page, the IP address of the client 1 is never exposed to the web server 150 so that the client 1 is never put on the record. In addition, when another client (e.g., client 2 110b) visits the same web site through the same proxy server 1, it will be recorded as from the same source (the IP address of the proxy server 1). The identities of individual clients are not recovered and recorded in this process.
The scheme shown in FIG. 1 may also lead to a different problem. When there are multiple proxy servers available in the proxy server group 120, a requested web page may be delivered through different proxy servers. For example, to balance the load on proxy servers, the proxy server group 120 may direct subsequent requests from a same client to the web server 150 via different proxy servers represented by different IP addresses (e.g., to IP address 1 representing the proxy server 1 120a and to IP address k representing the proxy server k 120b). In this case, the web server 150 may record the subsequent hits from the same client as from different sources. In both above described scenarios, the web site hits from visitors are not correctly recorded and this may further lead to inaccurate statistics and even incorrect characterization of the usage of an underlying web site.