The techniques described in this section are techniques that could be used, but not necessarily approaches that have been previously conceived or used. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Network operators are faced with the growing problem of unauthorized access to network resources. Attackers can install malicious software, or malware, on a victim's computing device and use that software to send (exfiltrate) proprietary and confidential data out of the network to third-parties.
Malware, broadly defined, is class of software including a wide variety of hostile, intrusive or annoying forms of computer code. Malware can be, for example, a computer program designed to infiltrate a computing device without the device owner's knowledge or consent. For example, malware can include viruses, worms, Trojan horses (Trojans), rootkits, spyware, adware, and any other unwanted software. Malware can also include modifications to existing program code as well as new program code added into an existing code base. Some types of malware can, for example, collect personal and confidential information related to a user and send this information to another party. Other types of malware can control a system by executing commands as well as exfiltrating data. Still other types of malware may cause a computing device to function poorly, fail to meet quality of service standards, or to not function at all. Malware attacks that impair functionality are considered to be denial of service (DoS) attacks. These are only a few examples of what malware can be and what malware can do.
Network operators routinely employ various intrusion detection, anti-virus, and network security products and techniques to combat the problem. Many of these products and techniques operate by inspecting network traffic for malware signatures and known malware data patterns. Most of these products and techniques are operated by network operators from within their own private networks. These systems are not designed to provide intelligence on the malware. Rather, they are designed to alert network operators to potential or actual attacks on their own systems.
The authors of malware are continuously trying to stay ahead of the network operators. Often, attackers use a waypoint located on a third party network resource, called a command and control (C2) node, to assist in communication with the malware on a victim's computing device. Additionally, attackers also use certain C2 nodes as repositories of their malware. These C2 nodes may also be used to receive exfiltrated data from the victim's environment. The C2 nodes employed by attackers are usually logically separate from the victim computing device, and may also be geographically remotely located. Inspection of the activity of these C2 nodes involves accessing them individually and evaluating their operation.
One of the techniques used by malware to defeat network security is to make the communication protocol between the malware and the C2 node blend in with typical user Internet surfing behavior such as requesting Uniform Resource Locators (URLs) on the Internet. The URLs that the malware requests can contain benign information as well as information that can be used to control the behavior of the malware on the victim's computer. In many cases, portions of the malware communications are intentionally obfuscated in an effort to conceal them from evaluation by a casual observer, even if their presence is detected.
There are several families of malware that use C2 nodes as communications waypoints. Some of this malware is classified as Trojans. The Trojan resident on a victim computer can be configured to look for instructions in the data of a URL, such as where to go to download another file (a second URL), or can be configured to create a remote shell session with an IP address controlled by the attacker.
The amount of malware using C2 nodes is increasing rapidly. Additionally, individual malicious files on victim computing devices may communicate with an arbitrary number of URLs on an arbitrary number of C2 nodes. Thus, as the number of victim computing devices compromised with malware increases, the number of C2 nodes generally also increases. Additionally, malware can utilize Dynamic Domain Name Services (DDNS) to create additional hostnames corresponding to a given C2 node, and vice versa, to resolve the same DDNS domain to various Internet Protocol (IP) addresses (e.g., C2 nodes). This effectively increases the number of URLs that are to be investigated. As a result, it is not practical for an individual malware investigator to manually inspect, explore or evaluate the operations of even a fraction of the number of operational malware C2 nodes. Furthermore, as described above, some of the communications between the malware and the C2 node may be obfuscated. Decoding and decrypting this information adds to the time and effort required to evaluate the operations of malware on C2 nodes.