Malware-based attacks pose significant risks to computer systems. Malware includes any malicious content, code, scripts, active content, or software designed or intended to damage, disable, or take control over a computer or computer system. Examples of malware include computer viruses, worms, trojan horses, ransomware, spyware, shellcode, etc. Malware may be received into a computer system in various ways, commonly through electronic communications such as email and downloads from websites. Computer systems are known to implement various protective tools at end-user computer devices or gateways or access points to the computer system for screening or detecting malicious content before the malicious content is allowed to infect the computer system. Conventional tools commonly rely on the ability to identify or recognize a particular malicious threat or characteristics known to be associated with malicious content or activity.
For example, common attempts to identify malicious content include screening incoming documents at a host computer or server based on a comparison with known malicious signatures. Such signature-based malware detection techniques, however, are incapable of identifying malicious content for which a malicious signature has not yet been identified. Accordingly, it is generally not possible to identify new malicious content or subtle variations of existing malicious content using signature-based detection methods. Furthermore, in many cases, malicious content is embedded in otherwise legitimate content, documents or files having proper structure and characteristics, and the malicious content may also be disguised to hide the malicious nature of the content, so that the malicious content appears to be innocuous. Thus, even upon inspection of a document according to known malware scanning techniques, it may be difficult to identify malicious content.
Other conventional tools for identifying malicious content implement behavior-based techniques or heuristics to identify characteristics of known malicious content or other suspicious activity or behavior. One such technique implements a “sandbox,” (e.g., a type of secured, monitored, or virtual operating system environment) which can be used to virtually execute untested or untrusted programs, files, or code without risking harm to the host machine or operating system. That is, conventional sandbox techniques may execute or detonate a file while monitoring the damage or operations post-detonation. Some operations that may be monitored included operations for writing to disk, initiating network activity, the spawning of new processes and any other potentially suspicious operations. These techniques, however, also suffer from the inability to identify new yet-to-be-identified exploits, e.g., so called zero-day exploits. Some sophisticated malware have also been developed to evade such “sandbox” techniques by halting or skipping if it detects that it is running in such a virtual execution or monitored environment. Furthermore, clever hackers consistently evolve their code to include delayed or staged attacks that may not be detected from evaluation of a single file, for example, or may lay in wait for a future unknown process to complete an attack. Thus, in some situations it may be too computationally intensive or impracticable to identify some malware exploits using conventional sandbox techniques.
Other tools, which help overcome limitations of conventional malware detection techniques have recently been implemented and include those based on a concept of content disarm and reconstruction (CDR), or content sanitization, which generally refers to techniques for analyzing or deconstructing content, removing aspects of the content that pose risks, and reconstructing the content to be at least partly usable by an end user. Other techniques exist for changing a format of the content, for example, to hopefully destroy any malicious content that may be part of the received content. Such CDR techniques aim to remove or disarm any malicious content that may be included in content and do not necessarily require prior detection of malicious content in the received content.
A similar concept for protecting computer systems from malware entering a computer system from web browsing activities includes web browser isolation environments that transform web content before providing the web content to an end-user computer device. Some web browser isolation environments generate a visual representation of the web content that is sent to a requesting end-user as opposed to live, potentially malicious web content that would otherwise be received by a browser. Like CDR solutions, web browser isolation environments aim to prevent any malicious web content from being received into the computer system or accessed by an end user and do not necessarily require prior detection of any malicious web content being accessed by a browser.
Thus, CDR and web browser isolation techniques may provide an advantage for protecting computer systems from yet-to-be identified attacks. But, without ever detecting the presence of malicious content it is difficult to determine the effectiveness of a CDR or web browser isolation solution in having prevented a potential malware attack.
Current systems do not provide capabilities for assuring or verifying the effectiveness of a CDR process or other content transformation processes performed for received content. Also, current systems do not provide capabilities for determining the effectiveness of a CDR or other solution (e.g., web browser isolation solution) in having prevented a potential attack on the computer system.
Thus, there is a need in computer systems for techniques to mitigate the risks posed by malware attacks that can be verifiable or for which successful prevention of potential attacks can be determined.