Malicious code is software that is designed to damage a computer system or its data or to prevent the computer system from being used in its normal manner. Also termed “malware,” malicious code includes viruses, Trojan horses, worms, and malicious active content. A virus is a particularly pernicious kind of malicious code, capable of attaching itself to disks or other files and replicating itself repeatedly, typically without user knowledge or permission. Some viruses display symptoms, and some viruses damage files and computer systems, but neither symptoms nor damage is essential in the definition of a virus. A non-damaging virus is still a virus, yet even non-damaging viruses are considered malicious if they consume valuable computer resources without permission.
Some viruses propagate by attaching themselves to files so that executing an infected file also causes the virus to execute. The virus then hooks into the operating system to infect other computer files as they are opened, modified or created. Before the popularity of the Internet, viruses were most commonly spread by sharing floppy disks that have been infected or that contain infected files. The recent, explosive growth of the Internet has increased the opportunities for spreading malicious code quickly throughout the world, for example, through infected files attached to electronic mail messages. When the email recipient executes an infected email attachment, the virus is propagated to yet another computer system.
To combat viruses and other kinds of malicious code, vendors have begun to offer anti-virus software that scans incoming files and other content for embedded viruses, Trojan horses, malicious document macros, and worms. The incoming content that is scanned typically includes attachments to an email message, the body of the email message itself, and scripts downloaded via HTTP. Such anti-virus software typically employs a proprietary catalog of viral signatures, which are often simple string of bytes that are expected to be found in every instance of particular viruses. Usually, different viruses have different signatures, and anti-virus scanners use signatures to locate specific viruses.
There are a large variety of viruses and other kinds of malicious code thriving on the Internet, but no single anti-virus scanner has 100% coverage of the known viruses. Each anti-virus scanner has its own set of viruses that the anti-virus scanner can detect, and many anti-virus scanners can detect viruses that are unknown to other anti-virus scanners on the market. Therefore, incomplete coverage of known viruses is a problem with individual anti-virus scanners.
Accordingly, attempts have been made to improve virus coverage by employing a variety of different anti-virus scanners. One example is the VIRUS CONTROL CENTRE™, which is currently offered from MessageLabs™ and is described at the http://www.messagelabs.com web site. The VIRUS CONTROL CENTRE™ product comprises a cluster of control towers that are populated with a plurality of scanning mail servers, a switch, and a load distributor. All incoming email is redirected to a control tower for initial processing and scanning. After being delivered to a control tower, the email is directed to a particular scanning mail server, which executes three different types of commercial anti-virus scanners on the email. If the email is “clean,” then the email is permitted to continue to its ultimate destination. Otherwise, the email is quarantined for 30 days and then destroyed.
This approach, however, suffers from several disadvantages, particularly in terms of latency. Latency is the delay imposed by scanning for viruses. For example, if each anti-virus scanner on the scanning mail server takes 400 ms to process an average email, then the latency imposed by the three anti-virus scanners is 1.2 seconds.
Although a 1.2 second latency may appear to be small at first blush, it is unacceptably large for interactive traffic such as surfing the World Wide Web. Email is not the only vector for transmitting malicious code, viruses can also be downloaded in web pages sent by the hypertext transfer protocol (HTTP) or in files sent by the file transfer protocol (FTP). If a user had to wait 1.2 seconds every time to see a new web page, the user would quickly become frustrated and seek less secure ways of accessing the Internet. On the other hand, a latency of about 0.5 seconds is still acceptable to most users.
Therefore, there is a need for a malicious code detection system and methodology with the good anti-viral coverage of multiple anti-virus scanners but characterized by the low latency commensurate with that of a single anti-virus scanner.