Field
Embodiments of the present invention generally relate to network security. In particular, embodiments of the present invention relate to automated malware learning and detection.
Description of the Related Art
Cyber security experts and hackers are in continuous battle that is not going to end anytime soon. Cyber attackers are becoming smarter and use advanced software and hardware technologies to initiate different types of attacks on computers/networks. A hacker/cyber attacker typically uses various types of malicious software such as viruses, worms, and Trojan horses for conducting illegitimate operations in computer systems and/or to get illegitimate access of network and/or network resources. Such malicious software/content may be used, for example, for causing damage to data or equipment, or for extracting or modifying data.
There are several security checks implemented within computer networks to detect and filter out malicious traffic/content/files. Detection of malware/cyber intrusion attempts is the first step towards securing computers/networks, and for implementing security checks at different levels such as at firewalls, gateways, and end user devices. Existing detection systems typically rely on signatures of known malware/malicious content/file/traffic to detect and filter them out. Therefore, most of the present day commercial anti-virus (AV) and intrusion detection systems (IDS) rely largely on signature-based methods to identify malicious code before the code causes harm to computer systems and/or travels through the network.
In typical signature-based systems, signatures of malicious code/traffic/file/content are stored in a signature database, which is updated at regular intervals with signatures of newly detected malicious code/traffic/file/content. Therefore, for an IDS to be able to detect a threat/malware, the signature, which is essentially a fingerprint for malware, should already be known and deployed within the IDS, usually through an AV update and/or a patch. Because signature-based systems are dependent on signatures relating to known threats, this paradigm has several drawbacks. For example, such systems can only detect threats that have already been observed and/or use variations on known threats that match existing signatures. Traditional signature based malware detection/intrusion detection systems are, as a result, prone to zero-day attacks, and are not able to detect/classify new malware and/or malicious code/traffic/file/content for which a signature has not yet been created.
With intruders becoming smarter, they are able to determine the type of traffic that is being detected and/or blocked by existing malware detection systems, and hence are able to change their patterns of attack so as to avoid signature-based detection. For example, an attacker may create and introduce a new type of malware/attack that leverage existing code bases but that is packaged in a different way thereby avoiding detection by signature-based detection systems at least until a new signature is developed and deployed for the new attack.
Furthermore, with numerous suspicious samples being submitted to existing AV engines every day, processing of such samples and creation of appropriate signatures is a big challenge. Current signature generation approaches are resulting in larger and larger AV pattern databases and create generic signatures that result in false positives, which require time to process and fix.
There is therefore a need in the art for an automated malware learning and detection system that can use a learning-based approach for effectively generating/updating a generic signature for malware detection, providing better detection for zero-day attacks, and controlling false positives in effective manner.