Despite the deployment of various kinds of defenses, worm programs, viruses, and other kinds of remote attacks over the Internet continue to cause damage, disrupt traffic, and otherwise hobble computer- and communications-based productivity.
An exploit is a form of attack, often embodied in a software entity such as a virus or worm program, which makes use of a system vulnerability in order to perform some action not intended by the system's designers. A cracker is a system-savvy person who employs the knowledge in hostile ways to the detriment of the system's intended use. Malware is a general term referring to any kind of software entity—directly executable, executable by an interpreter, or non-executable—whose purpose is to harm or to obtain unauthorized access to a computer system, typically with no human intervention in its operation. Malware includes, but is certainly not limited to, computer viruses and worm programs.
‘Remote-control’ attacks performed over the Internet, either by viruses, worm programs, and other ‘canned’ exploits of system flaws, or the use of such flaws in a remote attack by a cracker, have become very serious problems for computer systems connected to the Internet. There are many kinds of defenses ranging from identifying malware or other remote attacks and preventing their completion, to profiling expected behaviour, and then (once unexpected behaviour is detected) limiting the damage by shutting down part or all of the system, to ‘sand-box’ approaches in which imported software runs in a limited sub-environment, to rule-based approaches based on military security systems.
Despite the use of firewalls, virus-detection software, and all of these other defensive measures, such remote attacks continue to cause great damage.
A high proportion of the high-risk vulnerabilities reported concern buffer overflows. Buffer overflow vulnerabilities result from a failure to limit input lengths, so that input can overwrite memory used for other purposes. For example, the ICAT ranking of top ten vulnerabilities, as of Jan. 18, 2002, includes three buffer-overflow problems, of which two concern Microsoft's Internet Information Services (IIS) (TM) used for web-hosting, and one concerns the mapper from Simple Network Management Protocol (SNMP) to Distributed Management Interface (DMI) for Solaris (TM), widely used for servers. These vulnerabilities concern areas where substantial volumes of commercial Internet traffic could be compromised.
In the open-source world, of the top ten security alerts listed at the Debian home page (www.debian.org), as of Jan. 18, 2002, three also concerned buffer overflow vulnerabilities. Debian is a project providing an open-source GNU/Linux distribution.
Not only are buffer overflows a wide-spread and persistent problem, they also represent extremely serious risk, because an attack can use such a vulnerability to run arbitrary code with the privileges of the attacked application. Since many applications run with very high levels of privilege (using such facilities as the setuid or setgid directory flags of Unix (TM) or GNU/Linux), an exploit's code can often do practically arbitrary damage, and can obtain any information available on the attacked system.
The kinds of defenses which have been deployed previously against buffer overflow attacks are conditioned by the sources of such vulnerabilities, which are:    1. system data-input facilities which do not specify a limit on length;    2. systems implementation languages such as C and C++, which do not mandate array bounds checking (nor require that array bounds be declared), or which use character strings or other data terminated by a sentinel value (e.g., for C or C++, the null character), rather than by a separate count; and    3. absence of hardware protection against overwriting of code, or against execution of data (as in Intel's IA-32, where the memory-protection hardware does not distinguish readable data from executable code); or failure of a system or application to employ such hardware protections when available.
There are compilers for C and C++ which enforce array-bounds checks. Such checks eliminate many of the array vulnerabilities, but not all: C and C++ do not require arrays to have declared bounds, and idiomatic C or C++ code often uses a varying pointer rather than an array and an index. A fully dynamic approach, provides almost complete coverage if the library has been completely converted to this form (which, depending on the library, might be a large project in itself), but even then, there are idiomatic C usages (such as zero-length, unsized, or unit-length arrays at the ends of allocated structures to allow for flexible arrays) which are hard to cover fully in this way. Moreover, the performance penalty for the use of full bounds checking is sufficient to forbid general deployment, except as a debugging technique. Such checking results in an increase of between 75% and 200% overhead in code size and execution time. Although C++ permits the definition of arrays with checked bounds, the ISO-standard C++ library contains a vector<> template for which the default indexing (operator[]) implementation does not check the bounds. Thus, unless programmers use the ISO standard library with full cognizance of this hazard, it will virtually guarantee an ongoing supply of buffer-overflow flaws to exploit.
Other kinds of protections include the use of canaries: special data values placed at strategic points in memory and checked for integrity. The difficulty is that it cannot be guaranteed that any use of canaries light enough to permit good performance will also capture attacks against which they have not been tested. That is, canaries have a coverage problem: only parts of the data are protected. Canaries, by their very nature, provide ‘spot’ checks.
The bottom line is that dynamic checking in compilers and memory management facilities can only provide partial coverage, or nearly full coverage with such high performance penalties that the protection is relegated to the status of a debugging tool. Moreover, when such defenses are deployed, their vulnerabilities (any points which they do not cover) may become known all too quickly.
Cohen and Forrest, Somayaji, and Ackley have proposed an approach which might be called defense by diversity, the premise being that widely deployed versions of software are easy to attack because all of their instances are exactly alike. Many attacks are ‘canned’. That is, they are performed entirely by software entities created in advance by an experienced attacker, rather than requiring ongoing human participation in the attack during the execution of an exploit. The exception is unwitting participation (Trojan horse malware), as in the “I LOVE YOU” e-mail worm exploiting Microsoft Outlook™ where opening an attachment executes the exploit. Since many attacks are canned, or are otherwise performed remotely and with limited access (at least until the attack has succeeded), the attack depends on a priori knowledge of how the attacked system works: no extensive investigation can be performed during an attack when a surprise is encountered, because either the malware cannot be made sufficiently intelligent, or the cracker does not yet have sufficient access privileges to allow such an investigation. If the a priori expectations of the attack's creator or performer can be falsified by diversifying instances of the system, the attack will fail.
Cohen proposes a version of the diversity defense in which application software is modified periodically (i.e., in which the diversity is created through time: yesterday's program differs from today's). To implement Cohen's proposal, the system to be protected must be augmented with software which modifies the system on an ongoing basis.
Forrest et al. consider diversity in which changes are not successive, but start with the same root software which is then modified in a random fashion (e.g., during compilation). As a result, diversity according to Forrest et al. might be termed spatial diversity: different system creation instances use differing random input, so that different installations, distributed in space, contain diverse systems.
Whether the diversity is through time as proposed by Cohen, or through space as suggested by Forrest et al., the kinds of diversity which they have proposed are not ideally directed to the specific problems of defending against remote attacks in general and buffer overflows in particular.
For example, Forrest et al. propose incrementing stack frame size for routines at compile time, in random increments of 16 bytes, from no increments up to four increments. They found that this did indeed defend against a particular exploit. Suppose, however, that such a defense were used against overwriting the routine's return address, and thus branching to code provided by the exploit. A cracker could simply ensure that her exploit's code entry point is written at each of the five possible offsets of the return address, thereby rendering the proposed defense useless. As with canaries, this approach has a coverage problem.