Recently, there has been a rise in zero-day attacks (e.g., computer threats that expose undisclosed or unpatched computer application vulnerabilities). Unfortunately, current practice in vulnerability analysis and protection generation is generally manual. Zero-day attacks can be considered extremely dangerous as they exploit computer security holes for which no solution is currently available. Usually, zero-day attacks are released before, or on the same day that a particular vulnerability is identified to the public.
Automatic signature generation for zero-day attacks has generated much attention in recent times. Recent attempts to provide solutions to thwart zero-day attacks have included generating attack signatures for a single attack variant, searching for long invariant substrings from network traffic as signatures. Finding multiple invariant substrings from network traffic can include, for example, observing that multiple invariant substrings must often be present in all variants of a worm payload for the worm to function properly. These substrings typically correspond to protocol framing, control data like return addresses, and poorly obfuscated code. Such an approach however suffers from significant false positives and false negatives because legitimate traffic often contains multiple invariant substrings, and such polymorphic attacks could hijack control data without using an invariant substring. Moreover, the foregoing approach generates signatures from network traffic alone. The fundamental drawback of such mechanisms therefore is that carefully crafted attack traffic can mislead them to generate incorrect signatures.
Other approaches employed to overcome zero-day attacks have included leveraging information about vulnerable applications for improving both the accuracy and the coverage of signatures. For example, one technique employs protocol specifications to provide more protocol context to attack signatures and attempts to generalize the signature for observed attack instances. Moreover, the signatures provided are finite state automaton (FSA) inferred from clusters of similar connections or sessions. The edges in the connection-level finite state automata (FSA) can be either fields or messages, and the edges in a session-level finite state automaton (FSA) can be connections. Such an approach generalizes the signatures by replacing certain variable data elements with a wildcard. Shortcomings associated with such an approach are that it is dependent on attack instances observed. This can make resulting signatures too specific. For example, attack variants that make use of different message sequences cannot be captured by this approach. Also, wildcard-based generalizations typically cannot filter attack variants of buffer overrun vulnerabilities. Additionally, the validity of this approach can lead to over generalization and false positives.
Further mechanisms employed to overcome zero-day attacks have included utilization of address-space randomization based zero-day detectors and regular-expression-based protocol specifications to generate signatures for buffer overrun vulnerabilities. These signatures however generally do not contain any protocol context but typically only a pattern matching predicate for a particular protocol message. Signatures without protocol context can result in false negatives when different message sequences lead to the same attack and result in false positives when pattern matching a message at a non-vulnerable protocol state. Further, since these mechanisms use the length of the vulnerable input field as the buffer limit for the buffer overrun condition in the signature; this can cause false negatives for attacks that have shorter buffer length than that of the observed attack instance.
As will have been noted the foregoing techniques for overcoming zero-day attacks have typically focused on and/or have been dependent on attack instances observed. Other approaches for producing signatures to counter zero day attacks have included manipulating packet payloads and observing program reactions to them. Such approaches have involved generating signatures in three steps: (1) constructing probes by randomizing address like strings; (2) detecting exploits (e.g., code fragments or sequences of commands that take advantage of vulnerabilities in order to cause unintended or unanticipated behavior to occur in computer software and/or hardware) by observing memory exception upon probe utilization; and (3) generating signatures by finding in the attack input the bytes that cannot take random values. In step (3), probes are typically constructed for each byte other than the address string by randomizing its value. Nevertheless this approach has two limitations. First, the probing scheme randomizes each byte rather than leveraging data format information. Such a strategy generates significantly more probes particularly when multiple messages are involved in an attack. Additionally, the scheme works more reliably for text based protocols than binary ones because of a lack of protocol knowledge for binary data formats. Second, the approach can only detect control flow hijacking attacks. For example, such an approach cannot detect exploits of Windows Metafile (WMF) vulnerabilities.
A further signature generation approach to counteract zero-day attack vulnerabilities, and more particularly, to find attack invariants has been to flip bits of original attack data to generate probes. However, such an approach can be prohibitively expensive and on the whole can be impractical.
Yet another strategy for automatic signature generation utilizes program analysis for binary or source code. For example, one modality employs dynamic data flow analysis over the execution on attack input and generates a signature in the form of symbolic predicates. Such attack signatures generated by such dynamic data flow analysis can be inherently specific to the attack input used in the data flow analysis. A further methodology that can be adopted utilizes static program analysis to extract the program logic that processes the attack data and triggers the vulnerability. The extracted logic can be expressed in the form of Turing Machines, symbolic predicates, or regular expressions as vulnerability signatures. Turing Machine-based signatures typically however may not terminate, and regular expressions are not sufficiently expressive for many vulnerabilities.