Malware is short for malicious software and is used as a term to refer to any software designed to infiltrate or damage a computer system without the owner's informed consent. Malware can include viruses, worms, trojan horses, rootkits, adware, spyware and any other malicious and unwanted software. Any computer device, such as a desktop personal computer (PC), laptop, personal data assistant (PDA) or mobile phone, can be at risk from malware.
When a device is infected by malware the user will often notice unwanted behaviour and degradation of system performance as the infection can create unwanted processor activity, memory usage, and network traffic. This can also cause stability issues leading to application or system-wide crashes. The user of an infected device may incorrectly assume that poor performance is a result of software flaws or hardware problems, taking inappropriate remedial action, when the actual cause is a malware infection of which they are unaware.
Detecting malware is challenging as the malware authors design their software to be difficult to detect, often employing technology that deliberately hides the presence of malware on a system, i.e. the malware application may not show up on the operating system tables that list currently running processes.
Computer devices make use of anti-virus software to detect and possibly remove malware. Anti-virus software can make use of various methods to detect malware including scanning, integrity checking and heuristic analysis. Of these methods, malware scanning involves the anti-virus software examining files for a virus fingerprint or “signature” that is characteristic of an individual malware program. Typically, this requires that the anti-virus software has a database containing the signatures. When the provider of the anti-virus software identifies a new malware threat, the threat is analysed and its signature is extracted. The malware is then “known” and its signature can be supplied as updates to the anti-virus software database. However, scanning files for malware can consume significant processing resources potentially resulting in a reduction in the performance of a computing device.
In order to reduce this processing burden, some anti-virus solutions provide for one or more databases of trusted files that are highly unlikely to be a source of malware. These trusted files are those files published or authored by trusted sources. For example, those files that make up a piece of software distributed by a reputable software provider could be considered to be trustworthy such that, provided such files have not been modified since their publication/release, these files need not be scanned for malware.
The provider of the anti-virus software identifies files that can be considered trustworthy and applies a one-way hash function to the file to convert it to a fixed-length string known as a hash value (also known as a digest). For a description of one-way hash functions see Chapter 2 of Applied Cryptography by Bruce Schneier, 1997. The hash value provides a fingerprint of the file that is highly unlikely to be duplicated by another input. Given the extremely small probability of such a ‘collision’ and the one-way nature of a hash function, it is extremely difficult or almost impossible to calculate the input that has produced a given hash value, even though the hash function used to generate the hash value is publicly available. The list of the hash values of these trusted files is secured against unauthorised modification (i.e. by digitally signing the trusted file list) and provided to a user's device.
Prior to scanning a given file to determine if the file could possibly be or contain malware (for example when prompted by the user, when due to perform a scheduled scan, or when initiated in response to a request to run the file or in response to the receipt of the file), the anti-virus software will determine if the file is in the trusted file database. The anti-virus software applies the same one-way hash function to the file to be checked and then compares the resulting hash value with the trusted file database provided by the supplier of the anti-virus software. If a match is found in the database, there is an extremely high probability that this file can be trusted, i.e. it is from a trusted source and has not been modified since its first publication, and therefore it need not be scanned for malware.
When an anti-virus application is first installed on a device, it must build a trusted file database, as the anti-virus supplier does not necessarily know what files are in use, or are likely to be used, in each user device. Given that there are thousands of files that are published by a variety of trusted sources, these trusted file databases are large and can consume a significant amount of memory. More importantly, it can take many hours for an anti-virus application to build a trusted files database by scanning files stored in a memory. This problem can be exacerbated when the anti-virus application is installed on a device that has been in use for some time; the device may include a large number of data files such as user documents, photographs, cache files, temporary files and other content that must be scanned but is irrelevant for the purpose of populating the trusted files database. Populating the trusted files database can take more time than is saved in subsequent scans.