1. Field of the Invention
The present invention relates generally to computer security, and more particularly but not exclusively to methods and systems for evaluating computer files for malicious code.
2. Description of the Background Art
Machine learning technology is commonly used to detect malware. Currently, machine learning for malware detection involves supervised learning to generate a machine learning model. Generally speaking, a training data set of known malicious files and known normal (i.e., benign) files are prepared. A malicious file is labeled as “malicious” and a normal file is labeled as “normal.” The training data set is input to a machine learning module, which employs a machine learning algorithm, such as Support Vector Machine (SVM) or Random Forest algorithm. The machine learning module learns from the training data set to make a prediction as to whether or not an unknown file is malicious or normal. A trained machine learning module is packaged as a machine learning model that is provided to a computer system. An unknown file received in the computer system is input to the machine learning model, which classifies the unknown file as either malicious or normal.
Currently available machine learning models are very sophisticated and are able to classify files with a high degree of accuracy. However, while a typical machine learning model can tell if an unknown file is malicious, the machine learning model is not able to identify which section or sections of the file are malicious.