Decision trees and other heuristics are commonly used as predictive models to map observations about an item with conclusions about the item's target value. For example, a security-software vendor may use decision trees as predictive models for identifying or detecting malicious computer files (“malware”) based on attributes, characteristics, and/or behaviors of the files.
Decision trees and other heuristics typically classify a sample by determining whether the sample satisfies various comparable criteria (such as sizes and counts). For example, a malware-detection decision tree may determine that a file represents a malicious file if the file, among other behaviors or characteristics: 1) instantiates less than two visible processes, 2) has a file size that is greater than 7740 KB, 3) generates less than three icons, and/or 4) has a folder depth of greater than four.
However, decision trees and other heuristics are typically unable to interpret non-comparable values, such as discrete, non-related numeric values, associated with samples. For example, a malware-detection decision tree is typically unable to draw conclusions about the legitimacy of a file simply by comparing a hash of the file with a predetermined hash value since these hash values are generally unrelated and non-comparable. For example, a decision-tree branch having the statement “file hash ≦6967CF” generally has no meaning or significance since any given file may have a hash value that is less than or equal to “6967CF” and this fact typically has no bearing on whether the file is malicious.
As such, the instant disclosure identifies a need for systems and methods for translating non-comparable values (such as file hashes) into comparable values for consumption by decision trees and other heuristics.