This invention generally relates to detecting malicious content. More particularly, the invention relates to improving detection of malicious content by using a content scanner.
A typical web page for viewing by a browser is no longer a simple static “hypertext page.” Instead the web page is for engaging dynamic functionality of the browser. The dynamic functionality of the browser allows for interactive and animated websites. For example, typical web pages employ the dynamic nature of JavaScript/Visual Basic Script (VBScript), Asynchronous JavaScript And XML (AJAX) and JavaScript Object Notation (JSON)-like functionality of out-of-band data and code updates, eXtensible Markup Language (XML) to HyperText Markup Language (HTML) data binding and modern features of Cascading Style Sheets (CSS). Rising generations of rich internet applications running in Adobe Flash, Adobe Integrated Runtime (AIR), MS Silverlight, Novell Moonlight, Adobe Portable Document Format (PDF) and Sun JavaFX sandboxes dramatically leverage the power of well-known and seasoned Java Applets and ActiveX technologies by providing to a web developer a rich set of application programming interfaces (APIs). The dynamic functionality of the browser opens numerous possibilities for malicious code to exploit the browser and other applications at runtime.
Network security content scanners are inherently limited in their ability to find malicious code because they are designed to search for static heuristic patterns inside of the code. For example, content scanners try to search for exploitation method fingerprints, which tend to generate a high level of false positives while retaining a neglectful level of false negatives due to the endless variety of ways to encode malicious exploit code for the same vulnerabilities. Another problem with content scanners is that dynamic natural languages and obfuscated code usage make it difficult to determine whether an embedded script will exploit the browser or other application at run-time. Yet another problem with content scanners is that they analyze web document parts (e.g. HTML with embedded JavaScript or VBScript) independently. As a result, the content scanner cannot detect combined exploits.