A computer virus can be defined as a sequence of commands or instructions that interfere with a user's operation of, or cause damage to, his or her computer system. Computer viruses may damage a computer system directly, such as by deleting files or formatting a disk, or indirectly, such as by altering the system's protective measures and thus making the computer vulnerable to probing or other attacks.
Computer viruses therefore present a significant threat to the integrity and reliability of computer systems and will continue to present such a threat due to the trend toward interconnection of computers. The increase in computer-to-computer communications, via the internet for example, has caused a commensurate increase in the spread of viruses because infected files are spread more easily and rapidly than ever before.
Virus detection is thus an essential element in the effective maintenance of computer systems. In order to detect a computer virus, a virus detection program is generally employed in conjunction with a series of virus "profiles" or "signatures" which represent characteristics or patterns of known viruses. One type of virus detection routine monitors a program suspected of being infected by a virus. The program's behavior is compared to a profile of operating characteristics of a known virus and, if a match is found, the program is assumed to contain a virus.
While virus creators once focused on binary executable computer files (e.g., those with .EXE or .COM file extensions), they have broadened their horizons to target, for example, macros (such as those executed by word processing or spreadsheet programs) and even text-based files (e.g., word processing files, ASCII text files, etc). While many text files are unsuitable for performing malicious actions on behalf of a virus creator, others, such as batch and script files, contain instructions that are executed in conjunction with binary executable programs.
By way of illustration, mIRC is an internet relay chat program that allows multiple computer users, using computers remote from each other, to "converse" via the internet. A communication channel, or "chat room," is established by a user wishing to discuss a topic. Within a chat room, a user at one computer types messages that are received and displayed on the screen of the other users in the same chat room. Users can come and go from conversations, establish private communication channels, etc.
Upon its invocation, and during its execution, mIRC automatically invokes a number of script files to perform various functions. For example, EVENTS.INI contains instructions that mIRC applies in response to certain messages or events (e.g., a particular user joins the conversation, a conversant uses a specified word or phrase, etc.). Another script file, COMMANDS.INI, lists shortcut commands a user may employ. If, for example, the user frequently sends a particular message or response, he or she may create a short command (similar to a macro) which, when entered, is translated by mIRC into the longer message or response.
When one known version of mIRC is started by a user, a script file named SCRIPT.INI is executed. One command that may be included in SCRIPT.INI places the user's computer into a file transfer mode. This mode, which can be turned on and off, allows remote users in the same chat room to search the storage units (e.g., disk drives) attached to a user's computer system and to retrieve files residing on those storage units. This mode can be beneficial in the sharing of information between users, but, if it is included in the user's SCRIPT.INI without the user's knowledge, the contents of his or her computer system become vulnerable to pilferage.
Another command that may be executed in SCRIPT.INI causes the user's SCRIPT.INI file to be automatically transmitted to the computer system of each person who joins the user's chat room. Upon receipt of the file, the remote user's existing SCRIPT.INI file may be overwritten with the received version. If the transferred SCRIPT.INI file also enables file transfer mode (as described above), the remote user's computer system will, unknown to the user, become vulnerable the next time the script file is run.
These two "features" of mIRC are, in combination, sometimes termed the "mIRC virus." The virus propagates like a worm (i.e., it copies the entire file as opposed to simply inserting viral code into an uninfected file) and exposes a user's computer system to probing and file theft.
Text files such as the script files used by mIRC contain various character and formatting codes which merely alter the appearance of the file and/or its output, but which have no effect upon the execution of script or batch commands within the file. For example, when individual commands within SCRIPT.INI are executed, individual words may be separated by one space character, two spaces, a dozen spaces, a line feed, a tab character, etc. These are generally known as "whitespace" because they are invisible characters that merely serve to separate visible, printable, characters.
When a text file is edited, its whitespace is often reformatted or rearranged in order to yield a particular textual appearance. The resulting text file may contain the identical sequence of printable characters as a known virus, but have as little as one difference in the whitespace dividing the characters of that sequence. Further, multiple text files infected with the same virus do not always manifest the virus in identical forms. For example, one text file may have been edited subsequent to its infection, thus altering the appearance of the resident virus (including whitespace within the virus). Although still capable of performing its intended task, the textual appearance of the virus in the one file is different from its appearance in a second, unmodified, infected text file. As a result, when both infected text files are searched for a specific pattern or sequence of commands representing the virus in its unmodified form, an infected file will not necessarily be identified. In other words, a viral signature that has been modified will not be detected by a virus detection program and the user will unknowingly continue to use an infected file.
With viruses that cause indirect damage, such as the mIRC virus, the user's computer may be exposed to probing attacks for an extended period of time before the user becomes aware of and purges the virus. Because the user is unlikely to notice any direct, obvious damage caused by the virus (e.g., deleted files, formatted disks), there is nothing to alert the user to the infection.
As a related problem, some virus detection programs falsely report the presence of a virus in a text file that merely describes or refers to a known virus. For example, a text or word processing file may contain at least one textual extract--such as messages or other viral indicators that have been known to appear on the display of an infected computer system--from viruses known to infect executable computer files. The extracts may be included in the text file for informational purposes, such as to educate users as to known virus symptoms. When a virus detection program searches computer files for viruses by using indicia such as these extracts, the program may erroneously report that the text or word processing file contains a virus.
There is, therefore, a need in the art for a method of detecting a text-based virus in a text file regardless of how the whitespace within the virus and the file is formatted. There is also a need for a method of reducing the frequency with which virus detection programs falsely identify text-based files as being infected.