1. Field of the Invention
The present invention relates generally to networks, and more particularly, to the detection of malicious software agents, such as botnets.
2. Description of the Related Art
A botnet is a collection of software agents or robots that run autonomously and automatically, without human intervention. In the context of the Internet, the term “botnet” or “botnet network” typically refers to a collection or network of malicious software agents, known as “bots,” that are specifically designed to install themselves silently on a user's computer, without the user's knowledge. Such bots, which tend to be delivered through an ordinary web browser or email program, e.g., via viruses, worms, Trojan horses, backdoors, or other vulnerabilities, infect ordinary users' computers and usually have some malicious purpose, such as sending out spam email messages or performing a denial-of-service (DoS) attack against a particular target server or computer system. Once the bots are installed on users' computers, the originator of the botnet, referred to as the “bot master,” can remotely control the bots to effect nefarious activities. Such control is managed via a server known as the command-and-control (C&C) server, and unique encryption schemes are often used to keep the presence of the bots and their activities secret, as well as to protect against intrusion into the botnet network.
Once a C&C server has been discovered and identified, security measures can be taken to prevent the botnet originator from controlling the corresponding bots, such as by shutting down or blocking access to the C&C server once its Internet Protocol (IP) address has been identified. However, recently, in an effort to make botnets even more robust, the authors of botnet software have begun creating botnets that are harder to identify, detect, and stop.
One such type of botnet used by bot masters, known as a “fast-flux” botnet, is more flexible and robust against take-down actions. In this scheme, the bots use domain-name servers (DNSs), i.e., computers that resolve domain names to their appropriate hosts, to query a certain domain that is mapped onto a set of IP addresses that changes frequently. This makes it more difficult to take down or block a specific C&C server. However, this scheme uses only a single domain, which presents a single point of failure.
An even more robust type of botnet, known as a “domain-flux botnet,” has recently emerged, which overcomes the drawbacks of fast-flux botnets. Domain-flux botnets are botnets that maintain a communication channel between the bots and the C&C server through periodic domain-name registrations and queries. Since the domain name and corresponding IP address of the C&C server in a domain-flux botnet scheme constantly change, it can be relatively challenging to detect and thwart domain-flux botnets.
FIG. 1 illustrates graphically an example of a domain-flux botnet 100, which includes a C&C server 101 and a plurality of bots 102. The bot master uses a domain-generation algorithm (DGA), which creates lists of domain names from a random seed (usually the date in conjunction with some passcode). Using the DGA algorithm, the bot master pre-computes a plurality of domain-name lists and then randomly registers one or more domain names from the lists through a domain-name registrar. Each bot 102 in botnet 100 is equipped with the same DGA algorithm and periodically re-computes a list of domain names corresponding to the known seed. Not all domain names on the lists that are generated by the DGA algorithm will actually be registered by the bot master, who generally uses anonymous means to register the domain names with a domain-name registrar. Accordingly, each bot 102 must proceed through the domain names in the list, either sequentially or in a random order, performing queries on domain-name servers (DNS) in an attempt to locate domain names in the list that are registered. Thus, if a domain name is blocked (e.g., suspended by the registrar due to reported malicious activities), bot 102 can still find a valid domain name as long as there are other valid domain names in the list. Typically, after many DNS-query failures due to unregistered or blocked domain names, bot 102 eventually reaches a valid domain name that has been registered by the bot master. At that point, the response returned from the DNS query will contain the current IP address of C&C server 101. Bot 102 can then communicate with C&C server 101 to download commands and updates or to upload certain confidential information collected from the infected host computer on which bot 102 resides.
Due to the periodic updates of both the IP address and the domain name for C&C server 101, it is difficult for the network administrator to block the botnet or track the location of C&C server 101. Examples of domain-flux botnets are the Conficker-A, Conficker-B, and Torpig botnets, all of which employ DGA algorithms to compute domain-name lists. It is estimated that over 5 million machines are infected with various versions of the Conficker botnets, serving as bots.
The most commonly used approach for detecting domain-flux botnets is to capture domain-flux bots via a “honeypot,” which is a closely-monitored computing resource that can perform various functions, including providing early warnings about new vulnerabilities and exploitation techniques, serving as a decoy to distract attackers from more valuable computer systems, and permitting in-depth examination of attackers and malicious software used by attackers. Once the honeypot is infected with the bot software, the DGA algorithm can be deciphered through reverse engineering. When the DGA algorithm is revealed, the bots can be detected by matching the DNS queries with the pre-computed domain-name lists, and the botnets can even be taken over by registering all the domain names in the list before the bot master has a chance to do so. Such reverse engineering involves a huge amount of manual work and hence cannot keep up with the emergence of new domain-flux botnets, which have now become one of the major threats to the Internet community.