The present invention relates to the screening of computer data for viruses and more particularly to the screening of computer data for macro viruses.
Computer data viruses represent a potentially serious liability to all computer users and especially to those who regularly transfer data between computers. Computer viruses were first identified in the 1980""s, and up until the mid-1990s consisted of a piece of executable code which attached itself to a bona fide computer program. At that time, a virus typically inserted a JUMP instruction into the start of the program which, when the program was executed, caused a jump to occur to the xe2x80x9cactivexe2x80x9d part of the virus. In many cases, the viruses were inert and activation of a virus merely resulted in its being spread to other bona fide programs. In other cases however, activation of a virus could cause malfunctioning of the computer running the program including, in extreme cases, the crashing of the computer and the loss of data.
Computer software intended to detect (and in some cases disinfect) infected programs has in general relied as a first step upon identifying those data files which contain executable code, e.g. .exe, .com, .bat. Once identified, these files are searched (or parsed) for certain signatures which are associated with known viruses. The producers of anti-virus software maintain up to date records of such signatures which may be, for example, checksums.
WO95/12162 describes a virus protection system in which executable data files about to be executed are passed from user computers of a computer network to a central server for virus checking. Checking involves parsing the files for signatures of known viruses as well as for signatures of files known to be clean (or uninfected).
In 1995, a new virus strain was identified which infected, in particular, files of the Microsoft Office(trademark) system. Given the dominant position of Microsoft Office(trademark) in the computer market, the discovery of these viruses has caused much consternation.
Microsoft Office(trademark) makes considerable use of so-called xe2x80x9cmacrosxe2x80x9d which are generally small executable programs written in a simple high level language. Macros may be created, for example, to provide customised menu bars or xe2x80x9cintelligentxe2x80x9d document templates or may be embedded in some other file format. For example, macros may be embedded in template files (.dot) or even in Microsoft Word(trademark) files (.doc).
As the new strains of virus discovered in 1995 infect macro files, they are generally referred to as xe2x80x9cmacro virusesxe2x80x9d. It will be appreciated that the possibility for macro viruses to be spread is great given the frequency with which Microsoft Office(trademark) files are copied between two computers either by way of floppy disk or via some other form of electronic data transfer, e.g. the Internet. Indeed, viruses such as xe2x80x9cWM/Conceptxe2x80x9d are known to have spread widely and rapidly at a global level.
Producers of anti-virus software have approached the macro virus problem by maintaining and continuously updating records of macro viruses known to exist in the xe2x80x9cwildxe2x80x9d. As with more conventional viruses, a signature (commonly a checksum) is determined for each macro virus and these signatures are disseminated to end users of anti-virus software. The software generally scans data being written to or read from a computer""s hard disk drive for the presence of macros having a checksum corresponding to one of the identified viruses.
There are a number of problems with these more or less conventional approaches. Firstly, the number of macro viruses is exploding with around 3000 identified by mid 1998. There is inevitably a time lag between a virus being released and its being identified, by which time many computers may have been infected. Secondly, end users may be slow in updating their systems with the latest virus signatures. Again, this leaves a window of opportunity for systems to be infected.
WO 98/14872 describes an anti-virus system which uses a database of known virus signatures as described above, but which additionally seeks to detect unknown viruses based upon expected virus properties. However, given the ingenuity of virus producers, such a system is unlikely to be completely effective against unusual and exotic viruses.
It is an object of the present invention to overcome or at least mitigate the above noted disadvantages of existing anti-virus software.
This and other objects are met by screening computer data to identify macros which do not correspond to known certified and acceptable macros.
According to a first aspect of the present invention there is provided a method of screening a software file for viral infection, the method comprising;
defining a database of signatures indicative of macros previously certified as being virus free;
scanning said file to determine whether or not the file contains a macro; and
if the file contains a macro, determining whether or not the macro has a signature corresponding to one of the signatures contained in said database.
It will be appreciated that embodiments of the present invention have the advantage that they may be used to effectively block the transfer and/or processing of files which contain a previously unidentified (either to the local user or to the software producer) macro virus. It is therefore less critical (or even unnecessary) for the software to be updated to take account of newly detected viruses).
Preferably, said step of defining a database of signatures indicative of macros previously certified as being virus free comprises scanning a set of end user applications which are known to be virus free to identify macros therein, determining a signature for each of the identified macros, and compiling the determined signatures into the database. More preferably, the step of defining the database comprises the further steps of updating the database with additional macro signatures. This updating may be done via an electronic link between a computer hosting the database (where the scanning of the file is performed) and a remote central computer. Alternatively, the database may be updated by way of data stored on an electronic storage medium such as a floppy disk. The database may also include signatures corresponding to widely used proprietary macros, e.g. those used by large organisations.
Preferably, the method comprises defining a second database comprising signatures indicative of macro viruses, and scanning said file to determine whether or not the file contains a signature corresponding to one of signatures contained in the second database. This second database may be created at a central site and disseminated to end users by floppy disk or direct electronic data transfer.
Preferably, the method comprises creating a set of signatures corresponding to a set of user specific macros, certified by the user as being virus free. These signatures may be added to the first mentioned database, or may be included in a separate database. In either case, the method comprises scanning a macro identified in a file to determine whether or not the macro has a signature corresponding to a signature of a user certified macro. The user in this case may be an end user, but preferably is a network manager. In the latter case, database updates made by the network manager are communicated to the network end user computers where the virus screening is performed.
According to a second aspect of the present invention there is provided a method of screening a software file for viral infection, the method comprising:
defining a first database of known macro virus signatures, a second database of known and certified commercial macro signatures, and a third database of known and certified local macro signatures;
scanning said file to determine whether or not the file contains a macro; and, if the file contains a macro
determining a signature for the macro and screening that signature against the signatures contained in said databases; and
alerting a user in the event that the macro has a signature corresponding to a signature contained in said first database and/or in the event that the macro has a signature which does not correspond to a signature contained in either of the second and third databases.
According to a third aspect of the present invention there is provided apparatus for screening a software file for viral infection, the apparatus comprising;
a memory storing a set of signatures indicative of macros previously certified as being virus free; and
a data processor arranged to scan said file to determine whether or not the file contains a macro and, if the file does contain a macro, to determine whether or not the macro has a signature corresponding to one of the signatures contained in said database.
According to a third aspect of the present invention there is provided a computer memory encoded with executable instructions representing a computer program for causing a computer system to:
maintain a database of signatures indicative of macros previously certified as being virus free;
scan data files to determine whether or not the files contains a macro; and
if a file contains a macro, determine whether or not the macro has a signature corresponding to one of the signatures contained in said database.
Preferably, the computer program provides for the updating of said database with additional macro signatures.
Preferably, the computer program causes a second database to be maintained which comprises signatures indicative of macro viruses, and further causes the files to be scanned to determine whether or not they contain a signature corresponding to one of signatures contained in the second database. More preferably, the computer program causes a third database to be maintained which comprises signatures indicative of macros defined locally, e.g. at the level of a local network to which the programmed computer is connected. The computer program causes this third database to be scanned for a match between signatures of a file macro not already matched in the first and second databases, and signatures contained in the third database.