The present invention relates to a method of, and system for, heuristically detecting viruses in executable code by analysing the frequency distribution of machine code created.
A common form of computer virus infection is where the virus's executable code is attached to, or embedded in, a program or other computer file containing executable code which appears, on the face of it, to be benign. One well-established method of virus propagation is where the virus, once activated on a host machine such as a user's PC, will attach itself to one or more programs found on the host in such a way that that program, once run, will execute the virus's code giving it the opportunity to propagate again and/or to undertake whatever other malignant behaviours (such as destruction of files, etc.) have been programmed into it. This method of propagation does, of course, provide an opportunity to detect the virus, for example by associating checksums with program files and detecting when this checksum changes. That is of course only one of the many strategies which have been devised to detect viruses.
Another well-known method of detecting viruses, implemented in many of the anti-virus software packages which are available, involves scanning program and other files for certain characteristic sequences of bytes (known as signatures) which indicate the likely presence of a virus. One of the practical problems with signature-based detection is that it requires some skill and a significant amount of time, when a new virus is first detected, to establish a suitable characteristic signature of it. This signature needs to be one which does not produce too many false positives and which does not misidentify the virus, for example as an existing one with a more benign payload. This signature information then needs to be disseminated to sites which use the anti-virus package in question before it can be used there to detect the newly-identified virus. In recent years, many of the notable virus outbreaks have involved viruses which propagate over the internet and it takes time for publishers of anti-virus software to react when a virus outbreak occurs.
Some internet service providers offer anti-virus scanning of internet traffic passing through their internet nodes as a value-added service.
The present invention relates to a method of virus detection which is intended to be useful for ISPs performing anti-virus scanning, e.g. of executables such as program files attached to emails, though it is by no means limited to that application and may be used in any anti-virus package.
According to the present invention there is provided a method of scanning a computer file for virus infections comprising:
a) identifying program code within the file;
b) identifying the compiler used to create the program code;
c) determining the frequency distribution of selected machine code insructions or sequences of such instructions; and
d) flagging the file as possibly infected with a virus, or not, on the basis of comparison of the determined frequency distribution with a frequency distribution of machine code instructions or sequences thereof expected for that compiler.
The invention also provides a system for scanning a computer file for virus infections comprising:
a) means for identifying program code within the file;
b) means for identifying the compiler used to create the program code;
c) means for determining the frequency distribution of selected machine code instructions or sequences of such instructions; and
d) means for flagging the file as possibly infected with a virus, or not, on the basis of comparison of the determined frequency distribution with a frequency distribution of machine code instructions or sequences thereof expected for that compiler.