Malware-based attacks pose significant risks to computer systems. Malware includes, for example, any malicious content, code, scripts, active content, or software designed or intended to damage, disable, or take control over a computer or computer system. Examples of malware include computer viruses, worms, trojan horses, ransomware, spyware, shellcode, etc. Malware may be received into a computer system in various ways, commonly through electronic communications such as email (and its attachments) and downloads from websites. Computer systems are known to implement various protective tools at end-user computer devices and/or gateways or access points to the computer system for screening or detecting malicious content before the malicious content is allowed to infect the computer system. Conventional tools commonly rely on the ability to identify or recognize a particular malicious threat or characteristics known to be associated with malicious content or activity.
For example, common attempts to identify malicious content include screening incoming documents at a host computer or server based on a comparison with known malicious signatures. Such signature-based malware detection techniques, however, are incapable of identifying malicious content for which a malicious signature has not yet been identified. Accordingly, it is generally not possible to identify new malicious content or subtle variations of existing malicious content using signature-based detection methods. Furthermore, in many cases, malicious content is embedded in otherwise legitimate content, documents or files having proper structure and characteristics, and the malicious content may also be disguised to hide the malicious nature of the content, so that the malicious content appears to be innocuous. Thus, even upon inspection of a document according to known malware scanning techniques, it may be difficult to identify malicious content.
Other conventional tools for identifying malicious content implement behavior-based techniques or heuristics to identify characteristics of known malicious content or other suspicious activity or behavior. One such technique implements a “sandbox,” (e.g., a type of secured, monitored, or virtual operating system environment) which can be used to execute untested or untrusted programs, files, or code in a manner that eliminates or reduces risk of harm to a host machine or operating system. That is, conventional sandbox techniques may execute or detonate a file while monitoring the damage or operations post-detonation. Some operations that may be monitored included operations for writing to disk, initiating network activity, the spawning of new processes and any other potentially suspicious operations. These techniques, however, also suffer from the inability to identify new yet-to-be-identified exploits, e.g., so called zero-day exploits. Some sophisticated malware have also been developed to evade such “sandbox” techniques by halting or skipping if it detects that it is running in such a virtual execution or monitored environment. Furthermore, hackers typically evolve their code to include delayed or staged attacks that may not be detected from evaluation of a single file, for example, or may lay in wait for a future unknown process to complete an attack. Thus, in some situations it may be too computationally intensive or otherwise impracticable to identify some malware exploits using conventional sandbox techniques.
Other tools that help overcome limitations of conventional malware detection techniques have recently been implemented and include those based on a concept of content disarm and reconstruction (CDR), or content sanitization, which generally refers to techniques for analyzing or deconstructing content, removing aspects of the content that pose risks, and reconstructing the content to be at least partly usable by an end user. Other techniques exist for changing a format of the content, for example, to hopefully destroy any malicious content that may be part of the received content. Such CDR techniques aim to remove or disarm any malicious content that may be included in content. Some CDR processes result in the creation of modified content, or content that differs in one or more ways from the content entering a computer system on which a CDR technique is performed. In some systems, use of a CDR technique creates a modified file. Because some CDR techniques do not necessarily require prior detection of malicious content in the received content, these techniques may provide an advantage for protecting computer systems from yet-to-be identified attacks.
While in some cases it may be advantageous to perform a CDR process on all content received by or entering a computer system to prevent malicious content from infecting the computer system, there are some significant use cases where this may not be workable. For example, some computer systems may receive a significant volume of protected content as part of its regular course of business. Protected content may generally refer to any content that is encrypted or otherwise obscured or prevented from being accessed based on one or more controls placed on the content. Because existing CDR techniques require access to the underlying digital content, such techniques are ineffective when the received content is protected. And any sort of selective CDR processing of protected content also risks harm to the system because of potential mistakes in the selection process.
The challenges posed by protected content are exacerbated where the CDR technique is performed at a gateway to a computer system or some other device other than an end-user client device. Similar problems also exist for conventional tools for identifying malicious content, such as the signature-based and behavior-based techniques mentioned above.
Current techniques and systems do not provide capabilities for protecting computer systems from malicious content included in protected content. Thus, there is a need in computer systems for techniques to mitigate the risks posed by malware attacks included in protected content. There is also a need for using a CDR process for protected content, while attaining the benefits and goals for protecting the content.