Malware infection of computers and computer systems is a growing problem. Recently there have been many high profile examples where computer malware has spread rapidly around the world causing many millions of pounds worth of damage in terms of lost data and lost working time.
Malware is often spread using a computer virus. Early viruses were spread by the copying of infected electronic files onto floppy disks, and the transfer of the electronic file from the disk onto a previously uninfected computer. When the user tries to open the infected electronic file, the malware is triggered and the computer infected. More recently, viruses have been spread via the Internet, for example using e-mail. In the future it can be expected that viruses will be spread by the wireless transmission of data, for example by communications between mobile communication devices using a cellular telephone network.
Various anti-virus applications are available on the market. These tend to work by maintaining a database of fingerprints for known viruses and malware. With a “real time” scanning application, when a user tries to perform an operation on a file, e.g. open, save, or copy, the request is redirected to the anti-virus application. If the application has no existing record of the electronic file, the electronic file is scanned for known virus or malware fingerprints. If a virus or malware is identified in a file, the anti-virus application can take appropriate action, such as reporting this to the user, notifying an administrator, disinfecting or blocking the virus of malware. The anti-virus application may then add the identity of the infected file to a register of infected files.
The database for the anti-virus application may be maintained locally at the computer system, or may be located remotely from a client computer system, for example at a server. The server may also be used to perform a determination of whether the electronic file is malware. In this case, a client device that finds a suspicious electronic file sends signature information and other metadata information relating to the electronic file to the server that helps the server to detect malware files by comparing the signature and other metadata of the suspicious electronic file with fingerprints listed in a fingerprint database. Once the server has identified the suspicious electronic file (either as malware or not) it reports back to the client.
Fingerprints are patterns that are used to identify malware or clean files. Fingerprints are often based on some kind of signatures calculated in the client. Signatures can be simple full or partial file hashes, or may be generated using more complex static file analysis or dynamic behavioural analysis. Static smart signatures are determined by statically calculating different hashes over various parts of the suspicious electronic file, or using some other static file properties. Dynamic behavioural analysis may be, for example, analysing how the malware affects a computer system environment, running the malware in a virtual environment or monitoring the malware during runtime. Dynamic signatures are calculated based on the behaviour of the suspicious electronic file, for example by using the results of running the suspicious electronic file in a small virtual environment and hashing the results from the execution path analysis.
Metadata sent from the client to the server need not be just signature data. For example, file usage information, a download URL associated with the suspicious electronic file, file name and location, identities of any associated files or dynamic link libraries, registry changes etc could also be sent. Fingerprints can be created not just using the signature data but also based on other metadata. For example, an electronic file could be identified as malware or clean purely based on the download URL.
There are several advantages of using a client/server model, rather than storing the anti-virus application and the database locally at a computer device. These include the following:    1) There is no need to download full malware fingerprints to each client device, only the relevant ones.    2) When new malware is detected, it is “published” immediately at the server and available to all client devices. There is no need for each client device to wait for next scheduled database update. This ensures that each client device is protected against new malware as soon as possible after it is identified.    3) Data obtained from an anti-virus server can be used to obtain knowledge of the global malware situation, as the server sees queried signatures and can use those, for example, for case prioritization. Furthermore, it can be used to perform a statistical analysis to give a “reputation verdict”, which can be used to determine whether or not to allow execution of a suspicious file.
Creators of malware use many different ways to avoid detection when writing malware. An obvious way to avoid detection is to change the malware in such a way that a detection fingerprint that is stored in a fingerprint database no longer matches with the malware. Typically, it is easier to change static file attributes such as a full file hash. Behavioural based signature detection is more difficult for a malware writer to circumvent.
Polymorphic malware is malware that can be packed in different ways in order to generate packed electronic files that are binary executable files and include an unpacker. The packed electronic files are different, so have different hash values. This is typically done using an encryption technique. In extreme cases, each copy of the packed electronic file is different. This makes it very difficult for an anti-virus application to identify the packed binary executable file as malware from its static signatures alone.
Anti-virus applications address this problem by calculating signatures used for creating fingerprints from the unpacked form of the malware. This requires the anti-virus application to have reliable unpacking algorithms, as an anti-Virus program is only able to detect server side polymorphic malware using signature detection if it knows how to unpack the original form of the malware file and, it has detection fingerprint for the unpacked malware.
As every copy of the malware is unique (or at least very rare), polymorphism is an effective stealth mechanism to attempt to avoid detection. It is difficult for an anti-virus vendor to identify a new still undetected polymorphic malware, or variants of existing ones, and add detections for those to the database. Using the client-server model helps with the problem, as anti-virus server sees queried signatures and other related metadata and can maintain a list of most popular unknown files with all the metadata for further analysis and possible identification as malware. However, as packed versions of polymorphic malware are unique (or at least uncommon), they will not be directly visible in a “most-popular unknown files” list. This makes them more difficult to identify and prioritize for further analysis.