There are many types of malware which communicate back to their controller. These communications can consist of receiving commands and updates, exfiltrating data and passing other information in either direction. However, the use of single (or a small number of) pre-defined or hardcoded web-based points of communication (i.e. “rendezvous locations”) such as a single domain, email address, twitter account, etc. to which the malware can connect, leaves the malware vulnerable to being disrupted by the loss of control of that rendezvous location. Such loss can occur most frequently when law enforcement or cyber-security organizations take control of some or all of these locations. In addition, the use of a single rendezvous location allows the use of mitigating measurements such as blacklists.
In order to avoid this problem, many malware use a Domain Generation Algorithm (DGA) to generate new domain names each in short time intervals. Most frequently, a DGA is based on a Pseudo Random Number Generator (PRNG) which generates a list of domains using a seed which can be known by both the malware and its operators during this time interval using prior knowledge (most often—the current date). The malware attempts to access domains in some sequence (it is possible to choose from among domains which are generated by the PRNG in a random sequence) until it finds a domain which was registered by the malware's operator. This allows the malware and its operator to establish an ad-hoc communication channel.
There have been attempts (such as Antonakakis et al., “From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware”, 21st USENIX Security Symposium, Aug. 8-10, 2012, Bellevue, Wash., available at https://www.usenix.org/system/files/conference/usenixseecurity12/sec12-final127.pdf (last visited Oct. 12, 2016)) to discover indications of DGA communications from network data. For example, a forensic analysis can show attempts to perform a DNS query in order to communicate with multiple domains which do not exist and receive NXDOMAIN DNS response for those domains. It is also possible to improve accuracy of discovery by adding additional metrics such as a number of NXDOMAINS followed by a successful domain, or filtering domains belonging to known programs (such as file-sharing programs) which tend to have a large number of NXDOMAINS and assigning them different probabilities of being malicious.
Previous such attempts have been based analyzing the network level. As such these attempts can find (and potentially block) DGA at the level of a single endpoint based on network data originating from that endpoint. However, they do not receive information about the specific program/service on the endpoint from which this query originated. This is because the queries themselves do not contain any information about which program originated the request and which part of the code in the program was involved.
In contrast, the solution provided by the present invention installs software on the individual endpoints in the system. The software monitors the actual programs and collects information about where each request originated from. This is done, for example, by using Windows ETW (Event tracing for Windows) which is used for debugging. ETW allows the algorithm to collect information on exactly from what part of which program the request originates. As such, attributions can be made on a much finer grained scale and result in less errors.
While current methods of detecting such DGA can be accurate under certain conditions, they all suffer from the reliance on a stream of DNS queries from multiple programs on the same endpoint, all interlaced together. Thus, an attacker can avoid detection by lowering the frequency of queries in order to drown the NXDOMAIN responses in the noise generated by any active endpoint. In addition, the attacker may choose an existing popular service or website, such as Twitter, which defenders will not block.
In many cases, a sophisticated attacker could utilize the following method to achieve communication with malware:
1. Choose a website(s) or service(s) which allows users to create or modify content. Such services include Twitter (creation of twitter accounts), email services (creation of email accounts), hosting services such as Google Apps https://developers.google.com/google-apps/ (last visited Oct. 12, 2016) (creation of websites), blogging sites (creation of blogs), LinkedIn, shopping sites (creation of shopping carts and wishlists), Wikis, newspapers with comment sections, chat services (user logins), Skype (user name) etc.
2. Create a PRNG which can create instances of the type of content supported by the chosen service. The seed of this PRNG can change each time interval and be known during that interval by both the malware and its operator based on prior knowledge. For example, the current date, the daily trending Twitter hash-tag, the average temperature in Rio de Janeiro or the current USD-to-Yen exchange rate.
3. When communication is desired, the operator can register or create the generated rendezvous location (e.g. by logging into a website, creating a domain etc.) and modify the content on the chosen service(s). Any of the rendezvous locations generated for the current time interval will be enough.
4. After a successful connection the operator can pass commands and updates to the malware.
5. It's also possible for the malware to modify that information, using the point of contact as a bidirectional channel which also allows data exfiltration.
6. The operator can read the content modified by the malware. Note that content can be masked to hide its meaning through steganography or other embedding methods. For example, an IP address can be specified by the number of words in sentences to create a number from a paragraph.
7. The content thus transmitted between the controller and the malware can be direct instructions (such as the IP of a target to be Distributed Denial of Service (DDOS'ed)) or indirect information such as an IP to which the malware can communicate for more instructions or to exfiltrate data.
We define this type of communication as a Rendezvous Generation Algorithm (RGA). Note that this attack circumvents conventional methods of detection. By utilizing existing services/websites an attacker has several advantages:
1. Hard to block popular domains entirely (and blocking only specific communications to them is often impossible, mainly since existing defensive solutions do not provide such mechanisms to their users).
2. The communication will look innocent both to a human observer and an automated statistical model which looks for rare/new/strange domains.
3. Harder for law enforcement agencies to “take over” and control the domain.
4. It may be impossible to predict in advance based on the chosen seed—for example, the exact currency exchange rate in a future date.
However, there are several defenses which can be utilized. On the server side, for example, a website can detect that a given computer/IP is trying to access multiple accounts/pages/services which do not exist and slow down the access of that computer/IP. However, in many cases this will not be enough. Thus there is a need for solutions which can be utilized on the level of the individual endpoint, network or cloud proxy.