Technical Field
The disclosure relates to malware detection systems and, more specifically, to determining duplicate objects analyzed by a malware detection system.
Background Information
A prior approach to analyzing potential malicious software (malware) involves use of a malware detection system configured to examine content of an object, such as a web page, email, file or universal resource locator, and rendering of a malware/non-malware classification based on previous analysis of that object. The malware detection system may include one or more stages of analysis, e.g., static analysis and/or behavioral analysis, of the object. The static analysis stage may be configured to detect anomalous characteristics of the object to identify whether the object is “suspect” and deserving of further analysis or whether the first object is non-suspect (i.e., benign) and not requiring further analysis. The behavioral analysis stage may be configured to process (i.e., analyze) the suspect object to arrive at the malware/non-malware classification based on observed anomalous behaviors.
The observed behaviors (i.e., analysis results) for the suspect object may be recorded (cached) in, e.g., an object cache that may be indexed by an object identifier (ID) that is generated for the object. During subsequent analysis of a second object, the object cache may be searched using the object ID of the second object and, if there is a match, the second object may be deemed a “duplicate” of the suspect object and further analysis may not be required. Rather, the recorded analysis results for the suspect object may be used to either issue an alert if the object is deemed malware or to take no action if the object is classified as benign.
However, the malware landscape has changed whereby malware is now designed to evade detection and, thus, has become a pervasive problem for computers or nodes coupled to networks, e.g., on the Internet. Malware (or an exploit) is often embedded within downloadable content intended to adversely influence or attack normal operations of a node. For example, malware content may be embedded within one or more objects associated with file storage, email or web pages hosted by malicious web sites. Notably, malware may circumvent the prior analysis approach through the use of a package including two or more objects, e.g., a primary file and a secondary file, attached to an email or contained in file storage and/or downloadable content, where each of the objects may appear individually as benign. The package may be “tuned” to transform a previously deemed benign object (as determined by a previous analysis of the primary file) into malware through, e.g., activation of the malware contained in the secondary file of the package. The prior approach may not detect such maliciousness of the package because a comparison of object IDs between the primary file and cached entries of the object cache may indicate that the primary file (and, thus, the package) is non-suspect and no further action is taken.