Botnets present a serious and increasing threat, with a significant percentage of spam transmitted through bot networks. Other threats include distributed denial of service attacks, packet sniffing, keylogging, file system or registry harvesting, malware distribution, phishing, online advertisement abuse, and manipulation of online polls and games.
A botnet is a network of compromised machines that can be remotely controlled by a botmaster over a command and control (C&C) network. Currently, Internet Relay Chat (IRC) is the most common C&C communication protocol. Individual bots connect to a preconfigured rendezvous point, which is commonly an IRC server and channel—access to which may require password authentication—and await commands from the botmaster. Malicious bots are mostly useful in the aggregate, and proportionally so.
The Honeynet Project identifies four main Win32 bot families: (1) the agobot, phatbot, forbot, Xtrembot family; (2) the sdbot, Rbot, UrBot, UrXbot family; (3) DSNXbot; and (4) mIRC-based bots. Bots from families (1), (2), and (3) are generally standalone executables at most relying on external programs for certain discrete functionality. The mIRC-based bots (e.g., GT Bot), on the other hand, rely heavily on auxiliary scripts executed on the mIRC IRC client.
There may be thousands of trivial variants of any malicious bot which differ in code details or the values of string variables (e.g., those which define the C&C rendezvous point). There may also be non-trivial variants of a bot B where the derivative inherits some or all of B's functionality and optionally extends B's set of supported commands. G-SySbot, for example, non-trivially extends the functionality of sdbot. Within a particular family there may also be non-trivial variation in implementation. For example, the agobot family includes bots that use the WASTE protocol for C&C as well as those that use IRC. Alternatively, a bot may adopt certain functionality from another, but otherwise have an independent implementation, as is the case with Spybot which only borrows sdbot's synflood implementation. Finally, a single bot executable may have numerous, substantially different variants, which are generated by applying various packing transformations (including compression and encryption) to the bot binary. These factors contribute to the difficulty of bot detection.
A number of approaches are available for detecting and reacting to botnets; these can broadly be characterized as network-based or host-based. Network-based approaches entail monitoring network traffic in order to identify botnet activity. These approaches may rely on botnets using a particular C&C protocol and/or port and/or set of rendezvous points. Content-based filtering is a network-based approach that may require that bot network traffic be transmitted in the clear (not encrypted or obfuscated in any way) and contain certain known byte sequences at certain offsets in the packet payload. Clearly this presents a challenge as network-based elements may not be able to ensure that such traffic is transmitted in the clear or that such communications occur using particular ports or hosts, etc. Host-based approaches monitor activity on the systems on which a bot may be executing; these approaches can be further sub-categorized as signature-based or behavior-based. Signature-based approaches may compare the contents of memory or files to byte sequences obtained from analysis of known malware instances. A drawback of this approach is that malware for which a signature does not yet exist may go undetected. Moreover, transformations or obfuscations (source or binary-level) may be applied to a malware instance M (for which a signature exists) resulting in a distinct malware instance M′ that evades signature-based detection. Behavior-based approaches observe executing processes and apply heuristics to the observed behavior in order to determine whether such a process is likely to be malicious. For example, a behavior-based approach may monitor processes' incoming and outgoing network connections. Such approaches may generate too many false positives (e.g., by flagging behavior common across processes, benign and malicious alike) or too many false negatives (if, for example, the method only tracks network and not file system, registry, or process management behaviors) to be practically useful.
A method and system that address these and other related issues are therefore desirable.