Traditional virus detection applications employ simple pattern matching algorithms to detect viruses. In general, pattern matching algorithms search a target file for a set of known virus patterns that represent a set of known viruses. Each virus pattern is represented by one or more digital “virus signatures,” each digital virus signature storing a 0/1 digital block. If all signatures of a known virus pattern are found in the target file, the detection application concludes that the file contains the corresponding known virus.
FIG. 1 illustrates an example of a simple pattern matching algorithm. The pattern matching algorithm may be implemented in a virus detection application running on a computer system. When a file to be scanned for viruses (“scan target file”) is received, simple pattern matching algorithm 100 attempts to match the contents of the file with the contents of virus pattern file 110. Virus pattern file 110 contains a set recognizable virus patterns from Information Security Service Provider (‘ISSP’). Each virus pattern contains one or more digital virus signatures, offsets, and sizes (in bytes). Generally, the virus pattern file is stored within the memory of the computer system, is cached when a scan is performed, and is periodically updated for new viruses discovered by the ISSP. These updates may be received through a network. Pattern matching algorithm 100 retrieves each virus pattern from the virus pattern file and compares it against scan target file 130 for a match. The matching consists of comparing the digital signature of each virus pattern with a plurality of consecutive bytes in scan target file 130. The number of consecutive bytes and the starting position of those bytes are indicated by the size and offset of the virus pattern.
In this example, virus pattern 120 has multiple signatures, one of which stores a value of 1017 in offset parameter 121 and a value of 16 in size parameter 122. When pattern matching algorithm 100 attempts to match virus pattern 120 with scan target file 130, it will traverse all signatures related to virus pattern 120 to check if all signatures exist in scan target file 130. For example, for the signature with offset of 1017, the algorithm jumps 1017 bytes from the beginning of scan target file 130 and then compares 16 consecutive bytes with digital signature 123. If a match is found, a signature of pattern 120 is found. If all signatures of pattern 120 can be found in the scan target file 130, the algorithm concludes that the virus corresponding to virus pattern 120 exists within scan target file 130. Alternatively, the offset may be used to indicate the relative distance from another signature in the virus pattern file.
Although traditional simple pattern matching algorithms work well in many virus detection applications, there are continuing efforts to further improve the pattern matching algorithms for better performance. This is especially desirable when the number of virus patterns is dramatically increased. This increase may result from newly discovered viruses or mutations of existing viruses. Possible sources of mutation include small variations in the signature or offset of the virus pattern. For example, two viruses may have the same digital signature but a slightly different offset, or have similar signatures with only a few different bytes.
An increase in the number of virus patterns proportionately increases the time and resources required to perform the virus detection since the simple pattern matching algorithm must iterate through a larger number of virus patterns. Generally, it is desirable for the virus detection application to minimize CPU time when checking the scan target file. If the delay experienced by the user is too long, the user may become impatient and not use the virus detection application. Furthermore, an increase in the number of virus patterns proportionately increases the size of the virus pattern file. A larger virus pattern file increases the memory consumption of the virus detection application since the virus pattern file must be read during the pattern matching algorithm. This may be undesirable to the user if the computer system has limited resources.
An increase in the number of virus programs also increases the usage of network resources required to update the virus pattern file. For example, the ISSP may periodically send the virus detection application an updated version of the virus pattern file which includes newly discovered virus patterns. The transmission of the virus pattern file to the virus detection application is through a network. Depending on the size of the virus pattern file, the updating of the virus pattern file may create a bottleneck for the network.
Therefore, it would be beneficial if a pattern matching algorithm were created that is able to lower memory consumption and detect viruses more rapidly to provide a solution for the problems caused by traditional virus detection solutions such as slow scanning speed, big pattern files, big burden on computation resource (CPU, RAM etc.), as well as heavy pattern updating traffic via networks