There has arisen in recent years an increased risk from malicious code (often referred to as “malware”, of which a “virus” would be one type) being presented to user computers from websites. More specifically, one of the favourite attack methods of hackers over the last few years has been to inject malicious code into the web page code of legitimate websites. Unsuspecting users will then have a virus or other malware downloaded to their personal computer by visiting an infected web page. Sometimes, the virus or malware is downloaded without any user interaction; in other cases, the user is prompted to click a button to download what appears to be a legitimate file and then receives the malware. Such an attack vector is particularly insidious, as using web pages for malware propagation allows a malware to propagate using port 80 as HTTP traffic. In all systems, the HTTP port 80 is always open because otherwise a closed HTTP port would mean that the user would not be able to browse the Internet. As a consequence, the previous anti-malware measure of the prior art involving blocking particular ports is not effective against such attacks, as the HTTP port cannot be practically blocked without removing web browsing functionality.
The problem of injected code in web pages is not limited to the inability to block ports. In addition, the injected code uses “armour” techniques to disguise itself from any anti-virus or other host intrusion detection system that may be running on the target system. For example, “polymorphic worm” type malware that is injected into a web page is one of the most prevalent types of attack on the Internet because it is able to evade current host intrusion detection systems. The reasons for this are as discussed further below.
Firstly, the malware is injected into a web page that may be received from a legitimate server, and hence the web page code may be implicitly trusted by the target system user. Moreover, the malware is typically encrypted so that the malicious content is obfuscated and will not be easily detected using a traditional anti-virus scanning engine. More seriously, however, because the malware is contained within the code of a web page then once the malware is received at the target victim system it will run in the web browser memory that is considered a legitimate application by the host intrusion detection system (HIDS) running on the target system. In this regard, usually when a user tries to run an application or a file then a typical host intrusion detection system (HIDS) first scans the file for any malicious content. In addition, once the file is uploaded to memory then the HIDS will usually also scan the memory content for that file for any malicious code. However, a typical HIDS will perform such an operation only once for a particular file and memory range, and will then consider the application and the memory where the application is running as benign. Thus, if a malware manages to inject itself into the memory location of a benign program (such as the web browser) then it will evade detection because the HIDS has already scanned the memory and considered the application as benign. As a result, the injected malware will run with the same O/S privilege as the web browser, and hence when run may be able to infect the target victim system.
Thus far, therefore, injected malicious code in web pages is able to infect a target system by using encryption to obfuscate itself from traditional anti-virus scanning, and exploiting the system privileges already given to a web browser from a previous check of the web browser memory space by an existing anti-virus or other HIDS. However, even once the malicious code has been run and is still in memory it is not usually possible for a HIDS to detect the code, due to the automatic garbage collection functions of typical web browsers.
More specifically, when a web browser runs a script in memory, then once the script finishes running all the memory locations that are used by the script will be reallocated. The reason for web browsers to do this is to save memory and restrict memory usage for the web browser. Web browsers usually use an automated garbage collector process that allows web browsers to reclaim the memory space in order for the space to be reallocated to the browser after a script has finished running. By exploiting this mechanism, once a malware comprising injected web page code has run in the memory and infected the victim machine then the web browser garbage collector will usually remove the malware code from memory. This makes it hard to scan the memory to detect the malicious code, and adds an extra layer of evasion armouring to the malicious code.
US 2010/0235913 describes methods and systems for determining whether a collection of data not expected to include executable code is suspected of containing malicious executable code. Such collections of data are generally described as being data files or documents such as word-processing documents, music files, picture files, etc. The type of malware targeted by US 2010/0235913 includes polymorphic programs including an encrypted payload and a plain text decryption engine. The methods rely on identifying short portions of data which might correspond to executable instructions (e.g. as part of a decryption engine) and attempting to identify these as forming (a part of) an executable program. It does not address the case of how to detect malicious code within a file or collection of data which is expected to include executable instructions such as a web page containing java scripts.