1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to detecting malicious activities in the computer network.
2. Background of the Related Art
The term “botnet” refers to a collection of malicious software agents (referred to as robots) that run autonomously and automatically. The term “botnet” can also be used to refer to a collection of compromised computers (referred to as bots) each infected with one or more of such malicious software agents. For example, the infection may be a result of installation via drive-by downloads exploiting web browser vulnerabilities, worms, Trojan horses, or backdoors, under a common command-and-control infrastructure. Several botnets have been found and removed from the Internet. The Dutch police found a 1.5 million node botnet and the Norwegian ISP (Internet service provider) Telenor disbanded a 10,000-node botnet. Large coordinated international efforts to shut down botnets have also been initiated. It has been estimated that up to one quarter of all personal computers connected to the internet may be part of a botnet.
A botnet's originator (i.e., operator or controller) can control the bots remotely, usually through a means such as IRC (Internet Relay Chat), from a command-and-control (C&C) server. Though rare, more experienced botnet operators program their own commanding protocols from scratch. For example, these protocols may include a server program for C&C and a client program for operation that embeds itself on the victim's machine (i.e., bot). Both programs usually communicate with each other over a network using a unique encryption scheme for stealth and protection against detection or intrusion into the botnet network.
Recent botnets such as Conficker (e.g., described in P. Porass et al. “An analysis of conficker's logic and rendezvous points”, SRI International Technical report, March 2009), Kraken, and Torpig (e.g., described in B. Stone-Gross et al. “Analysis of a botnet takeover”, ACM Conference on Computer and Communications Security (CCS), November 2009) have exploited a particular method for botnet operators to control their bots, namely the DNS “domain fluxing”. In this method, each bot algorithmically generates a large set of domain names and queries each of them until one of them is resolved and then the bot contacts the corresponding IP-address obtained that is typically used to host the C&C server. In particular, the corresponding domain name is registered by the botnet operator for the C&C purpose. Beyond domain fluxing for the purpose of command-and-control of a botnet, spammers also routinely generate random domain names in order to avoid detection. For instance, a spammer typically advertises randomly generated domain names in spam emails (e.g., to promote spammed products on the spammer's website) to avoid detection by regular expression based domain blacklists that maintain signatures for recently “spamvertised” (i.e., advertised by spam emails) domain names. This is done to evade domain blacklists that identify domain names hosting malware, spyware, etc. Specifically, a spammer would generate many random URLs, possibly on different Top Level Domains (e.g., .org, .com, .ws, etc.) and then map those URLs to the same set of IP-addresses, where these IP-addresses are used for hosting promotions for the products promoted by the spammer. Typically, these URLs are advertised in the spam emails sent out by the spammer.
The botnets that have used random domain name generation vary widely in the random word generation algorithm as well as the way it is seeded. Generally, a botnet operator only has to register one or a few domains out of the large number of domains that each bot would query every day. Whereas, security vendors would have to pre-register all the domains that a bot queries every day in a blocking effort before one or a few of these domains are registered by the botnet operator. For instance, Conficker-A bots generate 250 domains every three hours while using the current date and time at UTC (i.e., coordinated universal time) as the seed, which in turn is obtained by sending empty HTTP GET queries to a few legitimate sites such as google.com, baidu.com, answers.com, etc. This way, all bots would generate the same domain names every day. In order to make it difficult for a security vendor to pre-register the domain names, the next version, Conficker-C increased the number of randomly generated domain names per bot to 50 thousand from which each bot would randomly choose 500 domains to query. In all the cases above, the security vendors would have to reverse engineer the bot program (e.g., executable code) to derive the exact algorithm being used for generating domain names (e.g., by Conficker, Kraken, Torpig, etc.) such that they can be blocked.
Generally, reverse engineering effort applied to the domain generation algorithm is not scalable for similar future attacks. In addition, DNS “domain fluxing” may be combined with DNS “IP fast-fluxing” where one domain name is mapped to a changing set of IP-addresses. Such combination would be even more difficult to detect. In summary, reverse engineering of botnet executables is resource-intensive and time-intensive such that precious time may be lost before the domain generation algorithm can be cracked to detect such domain name queries generated by bots.