Botnets are networks of hijacked computers that are coordinated by a bot herder with the purpose of carrying out attacks such as Distributed Denial of Service (DDoS), spam or stealing authentication credentials. Each one of the nodes that constitutes the botnet has been previously infected by malware that was specifically designed for compromising and controlling their computing capabilities. The infection process is usually achieved through remotely exploiting vulnerabilities in the API of the different components (i.e. network stack or memory allocation routines) that constitute the underlying Operating System (OS). Other infection vectors might locally exploit vulnerabilities in user space applications that could open the door to privilege escalation. The infected computers that constitute a botnet periodically poll special nodes known as Command and Control (C&C) nodes with the purpose of downloading updated malware code or uploading stolen information. The bot herder that rules the network of hijacked machines has the capability to issue new commands for setting the target machine of an attack (i.e. IP address or domain name) or simply gather information such as authentication credentials, passwords and keystroke logs.
The present invention solves the problem of determining the likelihood of botnet infection of a given IP address that corresponds to an Internet host. The present invention also solves the problem of benchmarking the accuracy of IP blacklist providers when they are reporting infections of networks of hijacked computers such as botnets.
There are several methods that can be used for mitigating the impact of botnets and most of them rely on forcing a communication disruption between the C&C nodes. Most of the currently available methods perform a technique known as sinkholing for dismantling botnet infrastructures. Such a technique can exploit the fact that hijacked computers that are members of a botnet usually establish connections with the C&C node by using DNS requests. In this way, it is then possible for a given ISP provider which manages name servers to monitor DNS queries and resolve domain names to fake IP addresses if they are blacklisted or tagged as malicious. A domain name is then considered as malicious if the IP addresses that resolve to it belong to hosts that are massively spreading malware or constitute the C&C infrastructure of a botnet. If a botnet avoids DNS resolution for determining the IP addresses of C&C nodes and relies instead on direct connections (i.e. with hard-coded IP addresses) ISP providers can still block malicious requests. This blocking process can be implemented in conventional routers or in firewall equipment specifically designed for avoiding DDoS attacks.
Other proposed mechanisms try to characterize patterns of traffic with the purpose of detecting abnormal behavior and then blacklist the IP addresses of the Internet hosts that are conducting an attack.
Dagon et al. in US2008/0028463A1 propose a flow-based detection system that uses statistical analysis of data such as the DNS request and SYN connection rates of hijacked computers or bots.
Guillum et al. in US2010/0037314A1 describe a method based on exponentially weighted moving averages and graphs for detecting bots that massively sign up new webmail accounts with the purpose of sending spam messages.
Perdisci et al. in US2010/0095374A1 propose a system based on statistical collectors and Internet search engines for detecting botnet-related domain names.
Strayer et al. in “Detecting botnets with tight command and control”, Proceedings of the 31st IEEE Conference on Local Computer Networks, 2006, pp. 195-202, ISBN 1-4244-0418-5 describe a method for detecting botnets that examines flow characteristics such as bandwidth, duration and packet timing.
Noh et al. in “Detecting P2P botnets using a multi-phased flow model”, Proceedings of the 3rd International Conference on Digital Society, 2009, pp. 247-253, ISBN 978-1-4244-3550-6 propose a method based on Markov models for detecting botnet traffic flows.
The next steps in a botnet detection process would be to reduce false positive rates and further refine its accuracy. This refining process should gather blacklists of IP addresses generated by several of the abovementioned methods and then aggregate all the information with the purpose of building in real time reputational scores for each one of the IP addresses. The obtained scores may then be used for denying or granting the access of botnet-infected nodes to web services such as online banks.
This process would be particularly useful for mitigating the losses originating from botnets and malware that specifically target financial institutions (i.e. those that steal customer credentials and then withdraw money from savings or checking accounts without user's knowledge).