§1.1 Field of the Invention
The present invention concerns determining the number of unique users behind a single Internet protocol (IP) address (or a given set of IP addresses) for a specific period of time. In particular, the present invention concerns determining the number of unique users behind a single IP address (or a given set of IP addresses) using various techniques and building an IP address-user number database for storing and retrieving such data.
§1.2 Related Art
In general, it is impossible to track the exact number of users behind an IP address. The following explains some of the common scenarios to be considered when attempting to track the number of users behind an IP address and the reasons that it is so challenging.
As a first example, several Web page impressions to one IP address might not have been caused by the same person. This is because some people share computers. In a family of five, some family members may visit the same Websites regularly, all using the same computer and Web browser. However, because that particular computer doesn't use a unique IP address for every family member, there is no way to distinguish whether these visits were made by one person or more. Sometimes the computer in question could be at a Cyber-Cafe or in a public library, where dozens of people might use a particular computer on a given day.
A single user might also have multiple computers; each with its own IP address, or using a shared IP address, depending on the network configuration. This makes it hard to establish, conclusively, whether requests from multiple IP addresses are from a single user or multiple users.
An additional complication is that of shared IP addresses. Because the Internet is growing so rapidly, the finite number of available IP addresses is a bit problematic. This has led to dynamic IP addresses allocation (e.g., shared IP addresses that can be provisioned as needed by an Internet Service Provider (“ISP”)) as described in more detail below. Many Internet users do not have a particular IP address assigned to their computer. Usually, when they log on to the Internet, the computer that they are using is assigned an IP address from a pool of available IP addresses by their ISP. When they log off, the assigned IP address becomes available and may then used by someone else. This means that an ISP, which only has a pool of a certain number of IP addresses, can service more subscribers than the number of IP addresses in its pool.
Firewalls raise additional considerations when trying to estimate the number of users behind a given IP address. Often corporations, and even smaller businesses, individuals and families, will want to limit access to and from the Internet. Firewalls are useful (a) when trying to make the home or office network more secure, or (b) when there are a limited number of IP addresses at the home or the office. When a firewall is used, all of the computers behind the firewall are separated from the Internet by one computer (usually called a proxy server). All communications into or out of the network pass through the proxy server. If users behind a firewall access Web pages, they will all show up under the same IP address (i.e., the proxy server's IP address). Different machines behind a firewall can sometimes be distinguished by examining operating systems (OS) and browser information, monitor setting information, etc. For example, if there are two different Operating Systems showing up under the same IP address at about the same time, there are probably at least two different machines behind a firewall.
A Web cache is a store of the HTML pages, CSS files, images, etc., for Web pages that a user has visited. Subsequent visits to the same Webpage (within a certain time) will result in files being read from the Web cache rather than being downloaded again from the remote Web server. This makes the Web pages quicker to load and reduces the consumption of network and/or server resources since the Web browser will not request Web pages from a Web server every time a user revisits the Website. Consequently these accesses often won't appear in the server logs or statistics. Caching can occur at the Web browser or at the ISP level—users can control caching at a Web browser level (by changing the cache settings) but usually not at the ISP level. In addition, some ISPs (e.g. America Online (AOL), for example) use multiple proxy servers. When people using AOL as an ISP make a request for a Webpage, the requests for the HTML, CSS, images, etc. can come from any of the proxy servers. Unfortunately, proxy servers prevent a website from directly learning about the numbers of different users accessing the website due to both (1) obscuring or masking the source of the request, and (2) web caching.
The number of users behind an IP address may be tracked using cookies. However this is not always feasible. For example, a user might have “disabled” cookies on their browser.
Despite all of the challenges in obtaining a reliable estimate of the number of unique users behind an IP address, it is nonetheless important to make such estimates. For example, such estimates are useful for determining whether or not visits to a Web page from one or more IP addresses are from a single user, or from multiple users. As another example, such estimates are useful for determining whether or not on-line advertisement selections from one or more IP addresses are from a single user or from multiple users. As yet another example, such estimates are useful for determining whether or not server resources from one or more IP addresses are from a single user or from multiple users. The number of unique users might be a useful indicator of (a) how widespread the popularity of a Website, an advertisement, etc., is, (b) “stickiness” of a Website, (c) click fraud on an advertisement, etc.
In view of the foregoing, it would be useful to improve IP address-user number estimates.