Malware detection systems can be configured to detect the presence of malware on compute devices. Some known malware detection systems collect a number of malware samples, and can compare each malware sample to a potential malware file sample, to determine whether the potential malware file sample matches a known malware sample. Such a process can be time-consuming and resource-intensive, and can require frequent updates to a known malware database to determine whether a file on a system is malware.
Other known systems can employ a list of rules or heuristics to determine whether to classify a file as malware. Such known systems typically rely on prior knowledge of a file's type to determine whether malicious code has been injected into a particular file. Such methods, however, can result in a large number of false positives, as a user's natural modification of a file (e.g., a user adding data to a text document) can change the placement and/or order of bytes in a file, causing the system to falsely detect that the file has been maliciously changed. Additionally, such known methods use knowledge of an expected arrangement of bytes in a file of a large number of file types, which can require a large number of resources to maintain.
Accordingly, a need exists for methods and apparatus that can use machine learning techniques to reduce the amount of time used to determine the identity of a malware threat.