Identifying the tasks a given piece of malware was designed to perform (logging keystrokes, recording video, establishing remote access, etc.) is a difficult and time consuming task that is largely human-driven in practice. The complexity of this task increases substantially when you consider that malware is constantly evolving, and that how each malware instance is classified may be different based on each cyber-security expert's own particular background. However, automated solutions are highly attractive for this problem as it can significantly reduce the time it takes to conduct remediation in the aftermath of a cyber-attack.
Earlier work has sought to classify malware by similar “families” which has been explored as a supervised classification problem. However, differences over “ground truth” for malware families (i.e. Symantec and MacAfee cluster malware into families differently) and the tendency for automated approaches to primarily succeed at “easy to classify” samples are two primary drawbacks of malware family classification. More recently, there has been work on directly inferring the tasks a malware was designed to perform. This approach leverages static malware analysis (i.e. analysis of the malware sample conducted without execution, such as decompilation) and a comparison with a crowd-source database of code snippets using a proprietary machine leaning approach. However, a key shortcoming of the static method is that it is of limited value when the malware authors encrypt part of their code—as we saw with the infamous Gauss malware. This work builds upon recent developments in the application of cognitive models to intelligence analysis tasks and our own preliminary studies on applying cognitive models to identify the tasks a piece of malware was designed to perform.
Corresponding reference characters indicate corresponding elements among the view of the drawings. The headings used in the figures do not limit the scope of the claims.