In the field of software security, platforms have been known which examine samples of viruses or other malware, and classify those samples into one of a predefined set of known malware families. In the most general terms, those classifiers can operate by examining the actual code of the malware to locate unique sequences or bytes, or they can instead examine the behavior of those entities while executing. In either of those respective signature-based and behavior-based classifiers, the logic assigns a generic malware family name or “label” to the best match from a library of known malware entities.
In addition, platforms are also known in the software security field in which samples of malware objects can be scanned for signature-based and behavior-based attributes, and assigned to malware groups having similar characteristics.
However, in known classification and/or clustering platforms, the system generally relies upon relatively high-level attributes or characteristics in narrowing down the potential classes or clusters into which a given malware sample will be placed. This limits the precision with which classes or groups can be assigned. Likewise, existing platforms, in particular for clustering purposes, rely upon a single chosen algorithm to identify similar malware groups, which can limit the effectiveness of the results. Further, existing platforms typically capture the (relatively high-level) attributes which they analyze from a fairly small sample set, which can also lead to inconsistencies or other shortcomings in the results.
It may be desirable to provide methods and systems for behavior-based automated malware analysis and classification, in which greater granularity in captured attributes, larger sample sets, and flexibility in applied algorithms can be leveraged to produce better malware identification results.