The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Botnets are the root cause of many malicious activities in telecommunications networks including denial of service attacks, click frauds, adware, distributed brute-forcing of a remote service, identity and data thefts, sending spam, and many more. A botnet comprises a number of machines, called bots, on which malicious software has been installed typically without the knowledge of users who are innocent or unaffiliated with the hacker. A botmaster is the attacker, and the botmaster remotely controls the bots using command and control (C&C) communication channels. When malware compromises a machine, the machine attempts to establish a connection to one or more C&C servers in order to download updates, retrieve commands, or transmit private information gained from the machine.
The most popular botnet structure is the centralized structure. In the centralized structure, the bots contact a particular pre-defined domain or internet protocol (IP) address on which the C&C server is located. The single point of failure in the centralized structure is the C&C server. Therefore, once the C&C server is taken down, the botmaster loses the entire botnet. One of the approaches to disable the C&C server has been to blacklist well-known C&C domain names to block communication with C&C servers associated with those C&C domain names.
However, modern malware has evolved and uses various techniques to hide its C&C server including the use of a domain generating algorithm (DGA). The DGA may be a simple algorithm that uses a seed, such as a current date and/or time to generate alphanumeric domain names. Alternatively, the DGA may be a complex algorithm that is sophisticated enough to generate English-language-like domain names with properly matched syllables or combinations of English dictionary words. A bot with DGA-based malware periodically attempts to communicate with the botmaster, and each attempt to communicate with the botmaster involves generating a plurality of domain names using a DGA and attempting to resolve each of the domain names until a domain name successfully resolves to the IP address of the C&C server for that malware. Prior to the bot communicating with the botmaster, the botmaster, using its own copy of DGA with the same seed as the DGA on the bot, generates a domain name and registers the domain name as the domain name for the C&C server, thus rendering techniques involving blacklisting of domain names ineffective.
To reduce the detectability of a C&C server, the botmaster tries to minimize the amount of time during which its C&C servers are exposed. The botmaster minimizes the exposure time by registering the domain names and making domain name system (DNS) server configurations only a few minutes prior to the time at which the DGA is configured to communicate with the C&C server. Once the time frame in which the DGA is configured to communicate with the botmaster passes, the C&C servers are shut down and removed immediately. Such minimization of exposure time renders ineffective any detection mechanisms that rely on a static domain name list. Additionally, DNS records associated with IP address of C&C server are deleted, therefore tracing of a DNS record to an IP address is also not feasible. Moreover, DGAs that can create English-language-like domains with properly matched syllables or use combinations of English dictionary words are almost always undetectable by means of a network domain's language analysis.
In a scenario where a bot is discovered, the bot has to be reverse engineered to uncover the DGA affecting the bot in order to block domain names generated by the DGA on the firewall or register the generated domain names before the botmaster registers them. Reverse engineering, however, is very time-consuming and requires an extremely advanced skill set. Additionally, the botmaster may configure the DGA to use a seed that is based on responses of popular websites such as google.com, baidu.com, answers.com or even trending topics on social networking websites such as Twitter or Facebook that are unknown in advance. Therefore, reverse engineering and employing a technique to filter domain names is also ineffective. Furthermore, the bot can generate so many domains, that registering or blocking all of the domain names is unfeasible. Thus, techniques for better detection of DGA-based malware are needed.