Digital devices and communication networks are now almost pervasive in industrialized nations. Personal computers (PCs) sit on almost every desktop and in almost every home, being relied upon daily to store, process and transfer all kinds of personal and business information. The explosive growth in PC use has been complemented by growth in large digital communication networks such as metronets and the Internet. This combination of computing devices and communication networks has resulted in levels of access to information, data and electronic services that was little more than a dream, a decade ago.
However, attacks by computer viruses, worm programs, and other hostile software (‘malware’), have become very serious problems for computer systems connected to large communication networks such as the Internet. Malware is a general term referring to any kind of software entity-directly executable, executable by an interpreter, or non-executable—whose purpose is to harm or to obtain unauthorized access to a computer system, typically with no human intervention in its operation.
Such attacks are also referred to as “exploits”. An exploit is a software entity which makes use of a system vulnerability in order to perform some action not intended by the system's designers. A list of the kinds of vulnerabilities commonly exploited can be found in “How to eliminate the ten most critical Internet security threats: The experts' consensus”, available at the SANS Resources web site (June 2001). This document provides a list of the security weaknesses which are involved in the largest number of security breaches.
Such automated or “canned” attacks are arguably a threat to the productive use of computers and computer systems in the modern world. Attacks by human hackers actively attempting to penetrate systems themselves are a far smaller threat, because human hackers cannot be massively replicated and distributed, or passed on to hostile but less sophisticated attackers. On the other hand, software entities such as computer viruses, worm programs, e-mails with hostile attachments, attack scripts, and denial-of-service attacks, including massive distributed “spamming”, can be generated by unskilled attackers using software developed by experts. More importantly, such automated attacks are often designed to propagate themselves through a network causing massive and widespread damage, rather than focussing on a single target. Thus, automated attacks have an entirely different threat model with quite different security parameters than non-automated attacks.
Defences against such automated attacks have been attempted in many ways including the following:                friend/foe identification, for example, requiring users to identify themselves with a login name and secret password to gain access to a system;        sand-box approaches in which imported software runs in a limited sub-environment. See for example, the open-source Janus sand-box protection system from the University of California at Berkeley;        virus-detection software which may either scan software as it is being downloaded, or scan it prior to execution. See, for example Norton AntiVirus™;        firewall software facilities which attempt to limit communication into a computer or local network in order to prevent, slow down, or render less hazardous the arrival of hostile software entities;        behaviour profiles, which compare user's activities to statistical summaries of a user's normal activity which they have prepared over time. For example, suppose a user normally has almost no outgoing file transfers from her/his computer over the network. If a sudden flurry of outgoing file transfers occurs, it could be that an intruder has penetrated the system and is stealing information. The intrusion-detection system notes that the behaviour is atypical, and may then shut down the outgoing transfers, block access to the network, inform the user, keep a record of the occurrence, or any combination of such things.        
There are several major problems with behaviour-profiles, including the following:                any profile obtained over a reasonably short period of time is unlikely to capture all legitimate behaviours, and activities which are perfectly legitimate, but infrequent, will often be interpreted as security violations;        rule-based access controls based on military security systems. See for example, information on SELinux (Security-Enhanced Linux), online. SELinux is a research prototype from NSA of a Linux operating system which applies access control rules to enhance system security, released for experimental purposes, primarily under the Gnu Public License; and        more comprehensive strategies such as that of the STATNeutralizer™. STATNeutralizer is a site protection system combining rule-based access control, intrusion detection using statistical profiles, and recognition of malware ancestry by pattern-matching on their code. In other words, the STATNeutralizer attempts to identify malware and prevent its execution attempts to limit the damage by profiling expected behaviour, and then (once unexpected behaviour is detected) to limit the damage by shutting down part or all of the system.        
Despite such attempts, good defences remain labour-intensive, and outside the easy reach of home computers and other low-cost system installations.
Part of the problem with these attempts is that they are unable to address new attack strategies and tools. Virus detection tools, for example, must be updated regularly to be effective against new viruses. Even with regular updates it is impossible for a virus detection strategy to offer flawless protection because no amount of updating will protect a system from unknown future viruses.
There are proposals for new diversity-based approaches which, rather than trying to keep up with changes in malware, diversify the attacked systems to make the creation of effective malware more difficult. The two main approaches are:                varying systems over time as described by Frederick B. Cohen in “Operating system protection through program evolution”, Computers and Security, 12 (6), October 1993, and        varying instances over systems in space as described by Stephanie Forrest, Anil Somayaji, and David H. Ackley, in “Building diverse computer systems”, Proceedings of the 6th Workshop on Hot Topics in Operating Systems, pages 67-72, Los Alamitos, Calif., 1997, IEEE Computer Society Press.        
The premise is that widely deployed software is easy to attack because all of the instances of that software are exactly alike. Since exploits are, almost always, entirely “canned” (i.e., they are performed entirely by software entities created in advance by a knowledgeable attacker, rather than requiring ongoing human participation during the execution of the exploit), the exploit must depend on a priori understanding of how the attacked system works: human intelligence cannot be applied during execution of such an exploit when a surprise is encountered. If the a priori expectations of the exploit's creator can be rendered erroneous by diversifying instances of the system, the exploit fails.
To implement Cohen's proposal, the system to be protected must be augmented with software which modifies the system on an ongoing basis (i.e., diversity occurs over the passage of time: yesterday's program differs from today's). Thus, at some level, the Cohen system must rely on self-modifying code, which is widely regarded as unreliable and unpredictable.
Forrest et al. consider diversity in which changes are not successive, but start with the same root software which is then modified in a random fashion. As a result, diversity according to Forrest et al. might be termed spatial diversity: different system creation instances use differing random input, so that different installations, distributed in space, contain diverse systems.
However, whether the diversity is through time as proposed by Cohen, or through space as suggested by Forrest et al., the kinds of diversity which have been proposed are less than substantial. While superficial changes might be effective against some malware, more substantial changes would be effective against a broader spectrum of malware.
Examples of the superficial changes which these proposals effect include the following: both Cohen and Forrest et al. suggest re-orderings of instructions within basic blocks (BBs) of code. A basic block is a maximal straight-line code sequence entered only at its beginning and exited only at its end. Note that this re-ordering has no impact on the data-flow graph of the BB—the change is entirely superficial. Malware identifying attack points by pattern matching could bypass such a defence.
The execution of a software program may be described in terms of its data-flow and control-flow. Data-flow refers to the ‘ordinary computation’ of a program: addition, subtraction, multiplication, division, Boolean computations, masking operations, and the like: the scalar data-flow of a program. The control-flow of a program refers to the control-transfers in the program—the decision points, and branch instructions that govern which lines of code in the program are to be executed;
Forrest et al. suggest re-ordering the parameters of routines. This is a slightly deeper change. However, the impact on the object code will typically only be to change the register numbers or local offsets of particular pieces of data. Again, the new data-flow graph after such a change will be isomorphic to the original one: the change is again quite superficial. Malware using pattern-matching identification of routines can bypass this defence;
Forrest et al. also propose performing compiler optimizations for parallel processing (where the target platform is not a parallel machine, since otherwise this would not constitute a change; it would simply be normal compilation procedure). This permits re-ordering of code among BBs instead of within BBs, which is a somewhat deeper change.
However, this has little effect on the data-flow patterns (expression graphs) used to compute particular values, and only changes the sites in the code where the operations in the expression graphs occur. The change remains superficial, though the pattern matching required of malware to bypass this defence is more complex.
Since these kinds of transformations are well understood in the art of compiler optimization, correcting for such transformations is by no means an insurmountable problem for sufficiently sophisticated malware—and there is every expectation that the sophistication of malware will continue to increase, as the history of such attacks over the last few years very clearly indicates;
Forrest et al. propose renaming entry points in APIs (application procedural interfaces). This will entirely frustrate attacks based on linking to such APIs using only name information, but will have no effect whatever on any attack which identifies such entry points by their initial code patterns instead of by name. Again, the superficial nature of the change makes it ineffective against (in this case, only moderately) more sophisticated malware; and
Forrest et al. propose randomly modifying the sizes of routine stack frames. Making this change may foil an exploit using a particular buffer-overflow weakness of a particular Unix™ utility, but if the exploit relies on exact knowledge of stack frame layout. As many exploits do not rely on such knowledge, this solution does not have universal application.
There is therefore a need for a method and system which provides resistance to automated attacks. This method and system should have minimal impact on the reliability and operability of existing software and computer systems, and consume as little additional resources as possible.