The present invention relates generally to data transfer systems and computer virus detection.
Computer viruses, including but not limited to self-replicating viruses, are a major concern to computer system operators. Although the mechanisms and destructive capabilities vary, typically computer viruses take the form of an executable program which attaches itself to or otherwise alters another executable program or data file contained on the storage medium of the computer. Computer viruses enter the computer system either by being loaded from a removable medium, such as a magnetic diskette, or by being downloaded over a communication system via a modem or network architecture. Often a computer virus will enter a computer system unbeknown to the operator. In many cases the presence of the virus may not be detected for months, during which time the file containing the virus or other files corrupted by the virus may be shared with other computer systems, infecting them also.
The computer virus problem is particularly acute in networked systems, where the opportunity for transmitting the virus from computer to computer is greatly increased. To combat the problem, others have employed virus scanning programs which read the files stored on a storage medium, looking for known virus signatures. These signatures comprise sequences of data which have been found to be present in a known virus. As an example, IBM has published a Virus Scanning Program, Version 1.1, which includes a compilation of known virus signatures. These known virus signatures are contained in a file entitled "SIGFILE.LST" associated with the Virus Scanning Program. Further details of the IBM Virus Scanning Program can be obtained by contacting the International Business Machine Corporation.
One major shortcoming of these virus scanning programs is that the virus may have already corrupted the data storage medium before the scanning program is used. Since the conventional virus scanning program tests files which are already stored on the computer system's storage media, such programs simply alert the user that the computer has a virus. These programs do not automatically prevent the virus from being stored on the medium in the first place, hence they cannot totally prevent the virus from attacking or spreading.
The present invention solves this problem by performing an in transit detection of computer viruses using a finite state machine technique which allows multiple virus signatures to be simultaneously tested for. Because the invention is able to test for viruses "on the fly," it is useful in data communications systems and in file copying systems to inhibit the virus from entering the computer in the first place. In a data communications system, for example, an incoming serialized bitstream transmission can be tested for a plurality of different known virus signatures. If any one or more of the signatures are detected, the file into which the incoming bitstream would have been stored is closed or aborted so the virus does not take up residency on the storage medium. For added safety, any portion of the file already written can be overwritten with 1's or 0's to ensure that none of the virus remains.
The invention may also be used to guard against the spread of a virus during a file copying operation, such as the operation of copying a file from a removable diskette to the hard disk storage medium of a computer. By inhibiting the storage of a virus-containing file onto the hard disk, the invention protects the computer system and prevents the virus from spreading to other computer systems which communicate with that system.
The invention is therefore able to prevent the spread of computer viruses, as contrasted with merely detecting when that spread has already taken place. Accordingly, the invention is applicable to a data transfer system for receiving a transmission of digital data for storage in a computer storage medium. The invention provides a method of identifying and inhibiting the storage of data containing at least one predefined sequence, which could be, for example, a computer virus signature. The method comprises the steps of causing a transmission of digital data resident on a source storage medium to be transmitted to a computer system having a destination storage medium.
The method further comprises receiving and processing the transmission to determine if at least one predefined sequence is present in the transmission. In response to this processing, the digital data of the transmission is caused to be stored on the destination storage medium, if none of the predefined sequences are present. In response to the processing, the digital data of the transmission is inhibited from being stored on the destination storage medium if at least one predefined sequence is present. As will be further explained, the predefined sequence can be a computer virus signature.
In the presently preferred embodiment the processing step is performed using a finite state machine which is capable of simultaneously testing to determine if at least one or more predefined sequences are present in the transmission. The finite state table of the preferred embodiment is loaded as a prebuilt table into a computer for use with the finite state machine.
The inhibiting step may be performed using the file handling mechanism of the destination storage medium. Using this approach, any stored data is selectively marked to be retained or to be discarded by overwriting. If desired, the inhibiting step can include the actual overwriting of at least a portion of the data marked to be discarded. The inhibiting step may also be performed by buffering the digital data prior to storage on the destination storage medium and by discarding the buffered data if at least one predefined sequence is present.
For a more complete understanding of the invention, its objects and advantages, reference may be had to the following specification and to the accompanying drawings.