Malware-based attacks pose significant risks to computer systems. Malware includes, for example, any malicious content, code, scripts, active content, or software designed or intended to damage, disable, or take control over a computer or computer system. Examples of malware include computer viruses, worms, trojan horses, ransomware, spyware, shellcode, etc. Malware may be received into a computer system in various ways, commonly through electronic communications such as email (and its attachments) and downloads from websites. Computer systems are known to implement various protective tools at end-user computer devices and/or gateways or access points to the computer system for screening or detecting malicious content before the malicious content is allowed to infect the computer system. Conventional tools commonly rely on the ability to identify or recognize a particular malicious threat or characteristics known to be associated with malicious content or activity.
For example, common attempts to identify malicious content include screening incoming documents at a host computer or server based on a comparison with known malicious signatures. Such signature-based malware detection techniques, however, are incapable of identifying malicious content for which a malicious signature has not yet been identified. Accordingly, it is generally not possible to identify new malicious content or subtle variations of existing malicious content using signature-based detection methods. Furthermore, in many cases, malicious content is embedded in otherwise legitimate content, documents or files having proper structure and characteristics, and the malicious content may also be disguised to hide the malicious nature of the content, so that the malicious content appears to be innocuous. Thus, even upon inspection of a document according to known malware scanning techniques, it may be difficult to identify malicious content.
Other conventional tools for identifying malicious content implement behavior-based techniques or heuristics to identify characteristics of known malicious content or other suspicious activity or behavior. One such technique implements a “sandbox,” (e.g., a type of secured, monitored, or virtual operating system environment) which can be used to execute untested or untrusted programs, files, or code in a manner that eliminates or reduces risk of harm to a host machine or operating system. That is, conventional sandbox techniques may execute or detonate a file while monitoring the damage or operations post-detonation. Some operations that may be monitored included operations for writing to disk, initiating network activity, the spawning of new processes and any other potentially suspicious operations. These techniques, however, also suffer from the inability to identify new yet-to-be-identified exploits, e.g., so called zero-day exploits. Some sophisticated malware have also been developed to evade such “sandbox” techniques by halting or skipping if it detects that it is running in such a virtual execution or monitored environment. Furthermore, clever hackers consistently evolve their code to include delayed or staged attacks that may not be detected from evaluation of a single file, for example, or may lay in wait for a future unknown process to complete an attack. Thus, in some situations it may be too computationally intensive or impracticable to identify some malware exploits using conventional sandbox techniques.
Other tools that help overcome limitations of conventional malware detection techniques have recently been implemented and include those based on a concept of content disarm and reconstruction (CDR), or content sanitization, which generally refers to techniques for analyzing or deconstructing content, removing aspects of the content that pose risks, and reconstructing the content to be at least partly usable by an end user. Other techniques exist for changing a format of the content, for example, to hopefully destroy any malicious content that may be part of the received content. Such CDR techniques aim to remove or disarm any malicious content that may be included in content and do not necessarily require prior detection of malicious content in the received content. Thus, CDR techniques may provide an advantage for protecting computer systems from yet-to-be identified attacks.
Some CDR processes result in the creation of modified content, or content that differs in one or more ways from the content entering a computer system on which a CDR technique is performed. In some systems, use of a CDR technique creates a modified file. While it may be advantageous to perform a CDR process on all content received by or entering a computer system to prevent malicious content from infecting the computer system, there are some significant use cases where this may be undesirable. For example, some computer systems may receive a significant volume of digitally signed content as part of its regular course of business. For these systems, a digital signature serves to authenticate not only the sender but also enables authentication of the message by validating that the received message is unchanged from when the digital signature was created. But if a received digitally signed file is modified by a CDR process implemented at a receiving computer system, validation of the received content will necessarily fail. Thus, a receiving entity will be unable to verify the authenticity of the content of any message received into the computer system that has been processed using a CDR technique. And any sort of selective CDR processing of digitally signed content also risks harm to the system because of potential mistakes in the selection process.
Current systems do not provide capabilities for performing CDR processes on received content that has been digitally signed without eliminating the benefits of the digital signature. Thus, there is a need in computer systems for techniques to mitigate the risks posed by malware attacks using a CDR process for digitally signed content, while attaining the advantages that digital signatures provide.