It is a usual practice for companies providing access to the Internet and for companies providing content and services on the Internet to generate logs of access and activity. Some examples of how logs are used are: for debugging and troubleshooting, detection and monitoring of abuse, statistical analysis, demographic analysis, report generation and other general business purposes.
FIG. 1 illustrates a typical environment in which a Web Server on the Internet logs activity. User 110 represents a user operating a browser and connected to the Internet 120. Web Server 130 is a web server connected to the Internet 120 and storing web pages for public viewing. When User 110, through the browser running on their computer, requests a web page stored on Web Server 130, a HTML document is delivered to the browser and displayed to User 110. In addition, a record is made of this activity in Access Log 140. A web server activity log will typically contain information regarding the access, but not the actual content of the access itself. For example, a web server log generally records the originating IP address, the name of the document that was requested and the number of bytes that were transferred to the client machine. It is common to record in an access log file a record of each access.
The Apache Software Foundation is an organization that supports an open-source web server known as Apache HTTP Server Project. Documentation and software for the Apache HTTP Server Project are located at http://httpd.apache.org. The Apache web site indicates that Apache has been the most popular web server on the Internet since April 1996, and as of 2005 represents more than 70% of the web sites on the Internet. The document entitled “Log Files” available on the Apache web site at: http://httpd.apache.org/docs/2.2/logs.html, incorporated herein by reference, describes several log file formats. Log file formats in use today, such as those described in the document referenced above, record the originating IP address of each machine that requests a document.
In cases such as FIG. 1 in which User 110 is directly connected to the Internet 120, the originating IP address is sufficient to identify the machine at which the request originated. However this is not the case in other scenarios. FIG. 2 illustrates a more common situation in which User Computer 210 is located on Local Network 220 behind NAT Gateway 230. Typically the IP addresses in use on Local Network 230 are unregistered or un-routable addresses that can be used within an enterprise but cannot be used on the public Internet. Un-routable addresses are addresses that have been set aside in the ranges 10.0.0.0 to 10.255.255.255, 172.16.0.0 to 172.31.255.255 and 192.168.0.0 to 192.168.255.255. IP addresses in this range may be freely used within a private network as they are guaranteed to be unused and unusable on the public Internet. NAT Gateways are used to convert packets coming from un-routable IP addresses into packets with addresses valid on the public Internet. This scheme is utilized to allow many machines to be used on an internal network without tying up as many public IP addresses, which are global resources.
In particular, NAT Gateway 230 operates a function known as Network Address Translation (NAT), which translates internal network addresses into external network addresses. Thus, a packet originating from User Computer 210 is translated by NAT Gateway 230 into another packet with a different source IP address and transmitted to Web Server 260 across the Internet 250. A return packet from Web Server 250 to User 210 will be transmitted to NAT Gateway 230, which will translate the packet into a different packet with the destination IP address for User Computer 210. The operation of NAT Gateways on the Internet is well known and in wide use today.
Frequently internal networks allocate IP addresses using a protocol known as DHCP. This requires the use of a DHCP Server 240 attached to Local Network 220. Briefly, the DHCP protocol involves the allocation of IP address upon request by machines on the local network. For example, when User Computer 210 powers up, it will request an IP address and DHCP Server 220 will allocate one. This operation is known as a “lease” and generally has an expiration time associated with it. The DHCP protocol generally requires periodic communication between User Computer 210 and DHCP Server 240 in order for User 210 to continue to be allowed to use the IP address to which it has been granted.
Many machines may exist on Local Network 220, and there may be multiple NAT Gateways within a large enterprise. This means that a request for a document on the Internet originating from a browser on a user's machine may be translated multiple times before it reaches the web server that is hosting the document. Thus, Access Log 270 that is recorded by Web Server 260 is insufficient to identify the specific machine that actually made the request.
There may be many situations in which it is desirable to identify a specific machine associated with activity on the Internet. These include debugging, detection of abuse, network integrity monitoring, billing, and if required by applicable laws. What is needed is an improved method for activity monitoring in which the specific endpoint associated with activity can be determined.