Many Web pages today contain active content for enhancing the experience of a user who downloads and displays a Web page at a client computing device using client software, typically a browser. In general, active content is a program or code transparently embedded in a downloaded document (e.g., hypertext markup language (HTML) defining a Web page). The code automatically executes on the client computing device when the Web page is downloaded, causing a type of action to occur. Most Web pages provide active content by including Java™ scripts, Java™ applets, Visual Basic® scripts, or Active X® controls in the HTML.
Embedded within application level information, active content poses a security threat to the client computing device. For example, Java™ or JavaScript™ code placed within collaborative application data, such as mail messages, chat messages, and shared documents, can exploit vulnerabilities in the client software executing the code. These vulnerabilities include cross-site scripting (XSS) holes and gaps in the Java™ security model, which may assume that the host delivering the data vouches for it. By exploiting such vulnerabilities, an attacker can perform unauthorized operations, such as causing execution of malicious code, taking control of the user's session, and stealing information from the user's computing device.
Because of these undesirable activities, some applications disable scripting languages. Although this effectively protects the user, this technique disables desirable functionality. Some applications remove all active content from the dynamic HTML (DHTML) content. Although this technique effectively avoids execution of malicious code, it also withholds harmless, potentially useful code. Another technique blocks the download of a document within which harmful active content is detected. However, this technique unnecessarily blocks the user from receiving non-dangerous active content in those documents found to have potentially harmful code. Thus, there is a need for a system and method capable of detecting and removing harmful active content from a document without preventing the user from receiving the document and executing non-dangerous active content in that document.