The invention, in some embodiments, relates to the field of computer threats, and more specifically to identifying and gathering information about advanced persistent threats.
Advanced persistent threats, such as computer viruses, computer worms, Trojan horses, and other malware, particularly when infecting endpoints in an organization's network, are some of the most crucial security problems for many organizations. Current security mechanisms are generally unable to cope with, and to prevent, infections attacks, and as a result third parties, such as crackers and cyber-terrorists, are able to insert malware into the networks of such organizations. Once malware is present on an organization's network, the malware communicates with its controllers, such as hackers and cyber-terrorists, via command and control (C&C) mechanisms, which direct the malware as to what data to obtain, where to find such data, and where to send the data once it is obtained. Typically, communication between malware and its command and control uses common protocols, such as HTTP, payload or encrypted payload over TCP and IRC. Some malware families are able to work independently, and only exfiltrate the data they are able to collect within the organization, whereas other families are remotely controlled by the attacker through a Remote Administration Tool (RAT).
One method currently used for identifying the presence of malware on a network involves signature matching or pattern matching of malware families. For this method to properly identify the presence of malware, the malware must first be caught and analyzed to derive one or more relevant signatures, which signatures are then used to prevent a malware infection by such malware in other computers in the network or in other networks. However, malware signatures are changed, added and mutated constantly, and signature analysis tools typically cannot keep up with the changing malware signatures, and therefore this method is far from failsafe.
In other methods, machine learning, behavioral analysis, and classification algorithms are used to find packets within the network traffic which include communication between malware within the network and the command and control mechanism controlling the malware, or other suspicious activities in the network. However, this method requires collecting all the traffic to and from the organization, collecting data from assets inside the organization and the computational analysis methods used to implement this technique often trigger false positives and/or suffer from false negatives.
Another method, known as “sandboxing”, involves running suspicious code in a secluded emulation environment, also called a sandbox, in order to identify the purpose of the code without the code being able to access the real resources of the organization. For example, a sandbox may be implemented by installing a proxy at the gateway to a network, and executing all HTTP pages within the proxy prior to forwarding them to the requesting node or computer within the network. However, there are multiple different methods by which malware can bypass a sandboxing technology, thereby reducing the effectiveness of this technology.
Specifically, use of a sandbox or emulation environment involves two main problems. First, there are multiple ways to evade the sandbox, for example by malware execution being delayed relative to the time of infection, such as by a week or more, or by the malware checking to see whether this computer is being used for various kinds of normal activities. In such cases, the sandbox does not block entrance of the malware into the network as the malware does not appear to be an executable when it first arrives. In some cases, the malware may determine that it is being run in an emulation environment, and delay execution of the attacking portion of the code to a later stage or decide not to execute at all, until it determines that it is no longer being run in the emulation environment.
A second problem is that when the sandbox or other emulation environment or technology manages to block an attack, one cannot gather intelligence regarding the goals and method of operation of the attacking malware in the targeted environment. Due to the fact that, after a failed or blocked initial attack attempt, most attackers continue to try to penetrate the same organization, it is beneficial for the organization to know what the attackers are after and how the attacking code operates in order to better protect the organization against subsequent attacks by the same attacker and specifically to know how the attacker will operate within this specific environment of the organization.
There is thus a need for a technology which identifies the activities of attacking malware in a way that prevents the malware from circumventing the technology, while allowing an organization's security team to gather information regarding the attacking malware's methods of operation and activities in an environment that mimics the real environment.