A computer in the present specification is a machine containing a processor and memory, and where the processor is able to execute instructions selected among a given set of instructions. A series of instructions for execution by the processor is called a “program” or “code.” When stored in the memory of the computer, a program is referred to as “passive code.” When loaded into the processor for execution, it is called a “process.” Data is information that may be handled or managed in any way by a computer program; data may also be stored in the memory of the computer. A network comprises a plurality of computers connected together.
We call “malicious” or “hostile” any code designed or modified to intentionally corrupt or steal data or programs from the computer system or network on which it runs. Protecting from hostile code is a challenging problem, since there is no way to programmatically distinguish positive and negative program actions, other than knowing whether they are ultimately good for the user or not. For example, a program may delete a file because the user has explicitly asked it to, but a malicious program could also delete a file against the user's will. In other words, there is no proper technical definition of “malicious” or “hostile” code—these being defined according to the behavior expected from a computer by its legitimate user.
Although it is possible to authenticate authorized users with password, trusted users themselves may endanger the system and network's security by unknowingly running programs that contain malicious instructions such as “viruses,” “Trojan horses,” “malicious macros,” “malicious scripts,” “worms,” “spying programs” and “backdoors.” A computer virus is a program that replicates by attaching itself to other programs. A Trojan horse is a program that in a general way does not do what the user expects it to do, but instead performs malicious actions such as data destruction and system corruption. Macros and scripts are programs written in high-level languages, which can be interpreted and executed by applications such as word processors, in order to automate frequent tasks. Because many macro and script languages require very little or no user interaction, malicious macros and scripts are often used to introduce viruses or Trojan horses into the system without user's approval. A worm is a program that, like a virus, spreads itself. But unlike viruses, worms do not infect other host programs and instead send themselves to other users via networking means such as electronic mail. Spying programs are a subtype of Trojan horses, secretly installed on a victim computer in order to send out confidential data and passwords from that computer to the person who put them in. A backdoor is a secret functionality added to a program in order to allow its authors to crack or misuse it, or in a general way exploit the functionality for their own interest.
All of the above programs can compromise computer systems and a company's confidentiality by corrupting data, propagating from one file to another, or sending confidential data to unauthorized persons, in spite of the user's will.
Along the years, different techniques were created to protect computer systems against malicious programs:
Signature scanners detect viruses by using a pre-defined list of “known viruses.” They scan each file for each virus' signatures listed in their known virus database. Each time a new virus is found anywhere in the world, it is added to that database. However, today more and more new viruses are created every day, and the known-viruses list needs to be constantly updated in order to be effective. Regularly updating an anti-virus is a heavy task for both the single-user and the network administrator and it leaves an important security gap between updates.
Another detection method, commonly called Heuristic Scanning consists of scanning programs for suspicious instructions that are typical to malicious programs and specifically viruses, without needing to have an exact signature of each virus in order to detect it in files. However, malicious program writers can avoid or hide those typical instructions by writing their code differently and/or encrypting it, and thus malicious code and viruses rapidly avoid detection by Heuristic Scanners.
U.S. Pat. No. 5,408,642 and U.S. Pat. No. 5,349,655, issued to Mann and U.S. Pat. No. 5,613,002, issued to Kephart et al., all disclose methods for recovering a computer program infected with a virus. The disclosed methods include generating fingerprint of data prior to infection by a virus and storing the fingerprint. Second fingerprint of data is then generated and compared to the prior strings of data, to determine if the data has been corrupted by a virus and for restoring the data to its initial state. These techniques do not prevent viruses from infecting, nor do they protect against other types of malicious programs.
U.S. Pat. No. 6,073,239, issued to the present inventor, discloses a method where file I/O activity is filtered. Whenever a program attempts to infect or inject code into another program file, it will be denied. The method, however, is only designed to work against executable-files viruses. It does not address other types of viruses, such as macro-viruses, nor other types of malicious programs: worms, Trojan horses, backdoors or spying software, because these malicious programs do not inject code nor modify other programs, but directly trigger malicious actions such as data corruption.
U.S. Pat. No. 5,421,006, issued to Jablon et al., discloses a method for assessing integrity of computer system software at the time of system initialization. Startup processes are verified before being allowed to execute. The method, however, does not prevent the protected processes from being corrupted in the first place, nor does it deal with data and programs other than those related to the system startup.
Other security methods consist of certifying programs that are authorized to run and blocking out all the other, unauthorized programs. Unfortunately, these techniques are not always adapted to open systems where users receive and exchange many files.
One common security system consists of establishing access control lists (i.e. ACL, DACL) that define restrictions and rights as to which users are allowed or not allowed to access certain resources, based on those users' rights. For example, system administrators are typically allowed to modify any files while simple users cannot read nor modify some confidential or critical files. Such security system is usually integrated in modern operating systems to ensure data security and confidentiality on a per-user basis. However, it is important to make a distinction and understand that this security scheme was designed to address the issue of user trust, not the issue of code trust. Users who run malicious programs within their systems will unknowingly compromise the integrity of every resource and file they're allowed to access with no further protection. For instance, let's say user X is granted full access to the shared files A, B and C. If this user runs a program infected with a virus, the virus will be able to read, infect or even destroy the files A, B and C. This is due to the fact that access control lists are designed so that programs and tasks run in the security contexts of the users who started them. Thus, even though the user has not meant to actually harm files A, B and C, the program he ran did harm these files despite of the user's will, yet according to the user's rights. This is the heart of the malicious code problem. If a user runs hostile code, the code will be able to corrupt and steal any data within the system or network to which its user has access. And if a system administrator runs hostile code, the entire system and network are immediately compromised. Additional to these security problems, access control lists are statically defined for each file and resource. In environments where files are shared and exchanged every day, this does not provide enough security against malicious code, since users usually don't take the time to assign the right security attributes for each new file they create or receive. Such systems are disclosed in EP-A-0 472 487, or in the article of Moffett J. et al., entitled Specifying Discretionary Access Control Policy for Distributed Systems, Computer Communications vol. 13 no. 9 pp. 571-580.
“Sandboxing” techniques allow testing suspicious programs by safely running them in a secure “sandbox” environment without letting the tested program harm the system or its files. Malicious programs, however, may not perform the offensive or expected actions immediately during the test period, either because they detected that they're being tested, or because they're designed to perform their offensive actions randomly or at certain dates, for example. Hence, although a program seems to behave correctly during the test period, it can harm the system once it has passed the test and has been allowed to run for real. Also, positive programs may not behave correctly or may not function at all within a sandbox, as they may need to access files and resources within the system for normal reasons.
U.S. Pat. No. 5,398,196, to Chambers, discloses a method for emulating a program's execution in a virtual process, while searching for viral behavior. The disadvantages of this method are the same as above.