Online services such as web-based email, search, and online social networks are becoming increasingly popular. While these services are used every day by billions of legitimate users, they are also heavily abused by attackers for nefarious activities such as spamming, phishing, and identity theft. To limit the damage of attacks, online service providers often rely on Internet Protocol (“IP”) addresses to perform blacklisting and service throttling. While such IP-based techniques work well for IP addresses that are relatively static and related to a few users, they are not effective for IP addresses that are associated with a large number of users or user requests.
These IP addresses that are associated with a large number of users or user requests are referred to herein as Populated IP (“PIP”) addresses. They are related, but not equivalent to the traditional concept of proxies, network address translators (“NATs”), gateways, or other middle boxes. On the one hand, not all proxies, NATs, or gateways are PIP addresses. Some may be very infrequently used and thus may not of interest to online service providers.
On the other hand, while some PIP addresses may belong to proxies or big NATs, others are dial-up IP addresses that have high churn rates, or datacenter IP addresses that connect to email service to obtain users' email contacts. Additionally, not all PIP addresses are associated with a large number of actual users. Although many good or trusted PIP addresses like enterprise-level proxies are associated with a large number of actual users, some abused or bad PIP addresses may be associated with few real users but a large number of fake users controlled by attackers. In an extreme case, some untrusted or bad PIP addresses are entirely set up by attackers.
Therefore, identifying PIP addresses and classifying PIP addresses as good or bad may be a useful tool for ensuring the security of online service providers. However, identifying and classifying PIP addresses is a challenging task for several reasons. First, ISPs and other network operators consider the size and distribution of customer populations as confidential and rarely publish their network usage information. Second, some PIP addresses are dynamic, e.g., those at small coffee shops with frequently changing user populations. Third, good PIP addresses and bad PIP addresses can reside next to each other in the IP address space. Fourth, a good PIP address can be temporarily abused.