Consumers and businesses increasingly use the internet to download web content from a variety of sources. The downloaded web content is often posted by the consumers and businesses on websites, including personal, commercial, academic, or government websites. In efforts to reach and affect large volumes of end users, computer programmers with malicious motivations often covertly embed malicious steganographic code in various forms of web content, such as HyperText Markup Language (“HTML”), JavaScript, and media files, including image files, audio files, video files, and animation files. The malicious code embedded in the web content may contain command and control instructions used to instruct and drive software robots (“bots”) operating on the internet. Accordingly, a computer hosting web content containing malicious code may unknowingly become part of a botnet. A computer hosting web content containing malicious code may drive bots in the botnet to perform malicious activities on the internet, such as spamming third parties.
Malicious code may be covertly incorporated into web content in a number of ways. For example, malicious code may be covertly embedded within unused portions of a code document, such as in comments sections or white space portions of an HTML file. Malicious code may also be encoded within software by replacing variables in the software code with characters encoding the malicious code. Malicious code may also be hidden within other types of files, such as media files, by integrating the code into portions of the file itself in such a way that the files appear to be unaffected. Malicious steganographic code may be incorporated into web content in such a manner that it is extremely difficult to detect by security software. However, the malicious code may be readily detected and used by a bot that is programmed to identify the malicious code.
Malicious steganographic code in various types of web content may allow the web content to function normally while enabling the malicious code to be read and used by bots crawling the Internet. Traditional security software may be unable to detect malicious code that is hidden within infected web content. What is needed, therefore, is a way for a user to process web content, particularly web content from untrusted sources, such that any malicious steganographic content embedded within the web content is disabled before the web content is used by an end user.