Malicious code that is included in Web resources is the predominant means through which computers are infected. If there is malicious code on a resource that a user downloads from the Web, that code will be downloaded to the client's computer and try to infect it. There are many ways to hide the malicious nature of the code including obfuscating the code, morphing it regularly to change how it looks to scanners, and burying it down several layers through a chain of files invoking other files. With the number of Web resources totaling in the billions it is infeasible to clean up, black list or even locate most of those malicious items on the Web especially in light of the efforts by the hackers to hide the malicious intent of those files.
It is estimated that less than 20% of existing malware on the Web is hosted on resources that were created specifically for the purpose of hosting malicious code. The vast majority of the Web pages that host malicious code are legitimate pages that have themselves been hacked. The purpose of this hacking is to plant malware on the page so that it can in turn try to infect or steal data from anyone who views the resource. Hackers who plant the malware may also be selective in what sites they try to infect. High use sites that are infected will cast a wide net for unsuspecting users, but sometimes a particular high value target or a segment of the population is being targeted. In this case the hacker may try to infect sites that are frequented only by users in a particular industry or locale. No matter what the target is, Web resources are a persistent source of malicious code that users need to be protected from. This invention addresses this need, to protect users from being attacked by malicious code that is contained in the Web pages and other files that they download from the Web.
Current methods to protect users from Web based malicious code generally fall into two or three types: The first method for blocking malicious code that accompanies Web resources is called signature analysis. It uses known signatures of malicious code to detect and block malware from executing in the browser. The limitation of this method is that this only detects and disables malware that was already known to be in existence. If the malware is being used for the first time, or has not been discovered and cataloged by a security agency, it will not be detected. Signature analysis also often fails when malware is disguised or morphed such that it is no longer identifiable by its previously known signature. If the signature for the malicious code is not on the black list then the analysis tool will not detect that it is malicious. For this reason, a significant amount of malware is invisible to the signature analysis tool. The higher the value of the target, the more effort an attacker would put into making sure that the malicious code will go undetected by existing signature analysis tools.
Blacklisting of Web addresses, or URLs, is a method used to block access to specific sites, categories of sites or entire segments of the Web to help safeguard users from Web based malware. Internet proxies are used by many organizations to block users from accessing entire segments of the internet that may have a higher likelihood of being infected. Blacklisting at the URL level has the same failing as the blacklisting at the code level. Internet borne malware is dynamic and blacklists are incapable of keeping up with the large number of infected sites and the rapidity that sites are infected or subsequently cleaned. This method suffers both a high false positive and a high false negative rate—resulting in limited gains in security and increased user dissatisfaction.
Another method for keeping the user safe when accessing the Web is to disable all executable code in the browser. The term executable code is used here to denote the computer code that accompanies a Web page that can perform functions other than just defining the look of a Web page (IE Hypertext Markup Language (HTML) and or Cascading Style Sheets (CSS) are not considered to be executable code). While disabling all executable code is an effective strategy for stopping malicious code from functioning it has a significant downside in that it negatively impacts the functionality of the pages that the user views. Most Web pages rely on executable code to keep the content of the site fresh and to interact with the user. The lack of functionality associated with completely blocking executable code will often cause the user to turn the feature off and thus opening themselves up to attack.
Web browsers process pages based on a complex interaction between a display oriented code (HTML and CSS) and executable code (IE JavaScript and Java). This interaction is simplified by an item created by the browser called the Document Object Model (DOM). Browsers create the DOM when processing the code to display a page. The DOM is a hierarchical listing that defines the structure of the page and the contents of each item in that structure. In order for executable code to make changes to the page it must make changes to the DOM. The browser then makes the changes to the page based on the new revisions to the DOM. Executable code that accompanies a Web page is used for many functions including validating form inputs, making changes to the page based on user mouse movements (ie showing or hiding pictures, text or menu items) and interacting with an external site through the use of Asynchronous JavaScript and XML (AJAX) to bring new content to the page without having to refresh the whole page. Most of these functions are accomplished by the executable code changing the DOM. This invention capitalizes on the use of the DOM as an intermediary between executable code that is potentially dangerous and the display layer code that is generally benign.
Advances in browser functionality since 2012 provide features that may be instrumental in simplifying the implementation of this present invention. These features include Web Real Time Communication (WebRTC) and DOM Mutation Observer. WebRTC is a technology that allows Peer-to-Peer interaction directly between two browsers. This would allow direct interaction between the browser on the client machine and the browser on a rendering machine that receives the Web page in parallel with the client. The peer-to-peer connection should significantly improve the connection speed between both browsers. If the security of this connection can be assured it could provide great benefit to the implementation of this technology. Mutation Observer is a new browser capability allows for easy tracking of the changes (mutations) to the DOM. Software that tracks the mutations to the DOM is able to summarize the results such that the data generated by each page change is greatly reduced. In this invention where changes to the DOM on one processor are also transferred to another processor, the capability provided by Mutation Observers will be instrumental in making this invention doable.