Malware is refers to malicious software, such as software intended to damage or disable computers and computer systems, disrupt computer operation, gather sensitive information or gain access to private computer systems without permission. Examples of malware include viruses, worms and Trojans.
Traditional signature-based security solutions compare the contents of a file to its database of known malware signatures to detect malware. However, with millions of new variants of malware bypassing traditional signature-based security solutions, today's threat landscape has many organizations scrambling to shore up their cyber defenses. High-profile data breaches are grabbing headlines, eroding customer confidence and costing organizations millions.
More recent solutions for combatting malware employ sandbox detection and data mining techniques. Sandbox detection refers to a behavioral technique by which a file is first run and monitored for malicious behavior in a secure environment before being allowed to proceed to its destination. Data mining techniques use machine learning to classify a file as malicious or benign given a set of file behaviors extracted from the file itself. Machine learning involves a training and runtime aspect. In the training aspect, a large number of sample files (e.g., “big data”) labeled as malicious or benign are provided to an algorithm referred to as a classifier to train the classifier (i.e., to allow the classifier to “learn”) on which behaviors are more likely to indicate a malicious or a benign file. During the runtime aspect the classifier analyzes behaviors of a new file in real time and classifies the file as malicious or benign based on its understanding of those behaviors in training.
These solutions, however, are inefficient because the burden of making sense of the data is on the user. With sandbox detection, a user is either required to manually review and inspect the output of an execution trace (i.e., the log of the behavior of the file while being run in the sandbox) or required to program rules to look for specific behaviors that the user must know about beforehand. With machine learning, a classifier such as Random Forests can be trained to classify behavior as malicious or benign given a large number of behavior training sets, but a user must manually tweak and weed out irrelevant rules using false positives and false negatives to improve the accuracy of the results.