Computer viruses, or simply “viruses,” continue to plague unsuspecting users worldwide with malicious and often destructive results. Computer viruses propagate through infected files or objects and are often disguised as application programs or are embedded in library functions, macro scripts, electronic mail (email) attachments, applets, and even within hypertext links. Typically, a user unwittingly downloads and executes the infected file, thereby triggering the virus.
By definition, a computer virus is executable program code that is self-replicating and almost universally unsanctioned. More precisely, computer viruses include any form of self-replication computer code which can be stored, disseminated, and directly or indirectly executed. The earliest computer viruses infected boot sectors and files. Over time, computer viruses evolved into numerous forms and types, including cavity, cluster, companion, direct action, encrypting, multipartite, mutating, polymorphic, overwriting, self-garbling, and stealth viruses, such as described in “McAfee.com: Virus Glossary of Terms,” Networks Associates Technology, Inc., Santa Clara, Calif. (2000), the disclosure of which is incorporated by reference.
In particular, macro viruses have become increasingly popular, due in part to the ease with which these viruses can be written. Macro viruses are written in widely available macro programming languages and can be attached to document templates or electronic mail. These viruses can be easily triggered by merely opening the template or attachment, as graphically illustrated by the recent “Love Bug” and “Anna Kournikova” macro virus attacks in May 2000 and February 2001, respectively. The “Love Bug” virus was extremely devastating, saturating email systems worldwide and causing an estimated tens of millions of dollars worth of damage.
Today, there are over 53,000 known computer viruses and new viruses are being discovered daily. The process of identifying and cataloging new viruses is manual and labor intensive. Anti-virus detections companies employ full-time staffs of professionals whose only job is to analyze suspect files and objects for the presence of viruses. On average, training an anti-virus specialist can take six months or longer. These professionals are hard pressed to keep up with the constant challenge of discovering and devising solutions to new viruses.
In the prior art, few automated tools for identifying new viruses exist. On the front line, the processes employed by anti-virus experts to discover new viruses are ad hoc and primarily reactive, rather than proactive. Typically, suspect files or objects are sent to the virus detection centers by concerned users who have often already suffered some adverse side effect from a possible virus. In times past, virus detection centers had more time during which to identify and analyze viruses, and to implement patches and anti-viral measures that could be disseminated before widespread infection occurred. Today, however, viruses often travel by e-mail and other forms of electronic communication and can infect entire networks at an alarming rate. As a result, the present manual processes for detecting new viruses are woefully slow and generally incapable of responding in a timely fashion.
Similarly, existing anti-virus software fails to provide an adequate solution to protecting and defeating new viruses. These types of software are designed to pattern scan and search out those viruses already positively identified by anti-virus software vendors. Invidious writers of computer viruses constantly strive to create new forms of viruses and easily evade existing anti-virus measures.
Therefore, there is a need for an approach to automatically identifying new forms of computer viruses and, in particular, macro computer viruses. Preferably, such an approach would be capable of identifying candidate virus families when presented with a suspect string or a particular virus family when presented with a suspect file or object. Moreover, such an approach would be capable of identifying a macro virus within a range of given search parameters.