The rise of the Internet and networking technologies has resulted in the widespread transfer of code, data and files between computers. This material is not always what it seems. For example, code that is accessed on a remote machine and downloaded to a computer system can contain hostile algorithms that can potentially destroy code, crash the system, corrupt code or worse. Computer viruses also spread through infecting other programs. For example, Visual Basic for Applications, or VBA, used in Microsoft's Office suite of products, provides a portal for virus entry through malicious use of VBA. Viruses, worms and other malicious programs and code can attack VBA compliant programs through the VBA portal. Moreover, Word or other VBA programs can, through infection by a certain type of malicious code, create a VBA virus: the malicious code may itself not be a virus but creates a virus and attack other VBA and non VBA programs on the user's machine as well. An early macro virus, W97M/Wazzu.A, operated by first infecting Word's default template normal.dot and spreading to each subsequent document.
Of course, malicious code is not limited to VBA compliant programs and may take many forms and infect many levels of the system's operation. Hostile, malicious and/or proscribed code, data and files (“code” as used hereinafter generally includes “data” and “file” or “files”) can infect a single computer system or entire network and so posit a security risk to the computer system or network. The user and/or administrator (generally referred to hereinafter as “user”) may wish to intercept, examine and/or control such code. The user might also wish to intercept, examine and/or control other code as well, for example, code which the user does not know to be hostile, but wishes to intercept nonetheless, for example, potentially sexually or racially harassing email, junk email, etc. This latter type of code is known hereinafter as “predetermined code.”
Hostile, malicious, predetermined and/or proscribed code (generally referred to hereinafter as “proscribed code”) contaminate the system in a number of ways. Proscribed code, for example, may provide instructions to be carried out by software on the system, such as by the operating system, applications, etc. Viruses generally operate in this fashion. Proscribed code may also infect transmissions from the system, such as a macro virus that infects the default Word template and thus spreads by infecting documents created under the template and subsequently disseminated to other users.
Proscribed code may be present as a contiguous character string within otherwise authorized program code. As the program code is being executed by the system, the proscribed code will be executed as well. Proscribed code may be inserted in the beginning of program code, so the application or system file executing the infected program encounters the proscribed code almost immediately after beginning execution of the infected program code. Alternatively, proscribed code may be placed somewhere within the program code, and will be executed when an application or system file is pointed towards the proscribed code. This latter technique is often used by macro viruses, which may be buried in the macro section of a Word document, for example. Macros, of course, are written in VBA code in a Word 97 document and interpreted by the Word application when it opens a Word document. Because macros are essentially small programs, they are subject to infection by virus code, and Word, as it interprets the document, will interpret and run any macro code it finds, including any virus code.
The placement of proscribed code within otherwise non proscribed code is not difficult. For example, a program such as Word permits modification to the macros section of a document, permits a user to add macros, etc. What may be difficult, however, is reading macros, through a non-Word program. Reading macros is difficult because Word (as well as many other programs) structures documents as complicated files in a manner that may be difficult to understand. So, for example, although macros may be often be located in a certain section of a Word document, where the macro section begins and ends may not be entirely clear. Moreover, a macro, and/or the macro section may be spread out in non contiguous blocks throughout a Word document, so that the beginning or end of the section is not clear. Attempting to understand this structure is extremely difficult.
Presently, antivirus programs that attempt to protect systems or networks from proscribed code may protect the system or network from the effects of the code, but may not remove the proscribed code from the system or network. Antivirus programs may not remove proscribed code because the antivirus program may only, upon detecting proscribed code, modify the pointers or other addresses to the code, rather than attempting to remove the code. Removal may be too difficult for the antivirus program, because of the difficulties associated with attempting to understand the file structure or by making no distinction between different file structures. Thus, although an antivirus program may alter the address of proscribed code in a file, thus making it difficult to run the code, the antivirus program may still permit the spread of proscribed code, by failing to remove the proscribed code from a file or program.
Prior art mechanisms also may, by failing to remove viruses, maintain ghost code, which is virus code left after the disinfection mechanism merely alters a location pointer. These ghosts may then be detected by other antivirus mechanisms thus slowing the process and possibly confusing the user.
Moreover, virus detection mechanisms typically detect viruses by reading through the document's code in a brute force type of detection. That is, the virus detection mechanisms of the prior art makes no distinction between the sections of code which might simplify and accelerate the scanning process. For example, a Word 97 document typically contain sections which may be divided into a Header Block, Document Property Blocks and Text Blocks. Macros, which are stored in Document Property Blocks, cannot be located in the Header Block or Text Blocks. Thus a macro virus would not be stored in a Text Block. Nevertheless, prior art virus detection mechanisms typically will scan text blocks for viruses, as they do not differentiate between document types nor sections thus lengthening the scanning process.
Further complications may arise from cross-platform transmissions of code. For example, a Word document, created in a Windows environment, may be transmitted through or stored in a UNIX® environment. The Word document, because it is created through a Windows environment for a Windows application, may not be capable of being reviewed in Unix® by an antivirus program. Thus infected documents can be disseminated through numerous platforms.
Accordingly, it is an object of the present invention to provide methods and apparatus for proscribed code detection.
It is a further object to simply and efficiently detect proscribed code.
It is a further object to simply and efficiently detect proscribed code and strip proscribed code from a non proscribed file.
It is a further object to simply and efficiently detect macro viruses and strip macro viruses from a non proscribed file or document.
It is a further object to detect proscribed code in a network or enterprise environment where cross platform transmission of proscribed code may occur.