The Internet has become a place over which unwanted, potentially harmful, and otherwise unsolicited data traffic is transmitted. Since complex computer systems and networks may not always be configured securely, and the installed software on computer systems often contains software defects and other vulnerabilities, they have become a target for intruders seeking to obtain unauthorized access or even outright control of a computer system.
This phenomenon has given rise to an industry providing various tools for “defending” networks, servers and computer workstations against such traffic, while allowing legitimate traffic to pass unhindered. A “firewall” is typically software that is installed in a network node; traffic passing through a firewall is inspected by first intercepting each packet and applying a set of rules to determine whether the packet should pass or be stopped. A firewall may be implemented in a networked computer such as a server or a workstation, as well as in dedicated nodes such as network access nodes and routers.
The functionality of a firewall may range from simple address filtering in which packets with predetermined source addresses or ranges of addresses are discarded, to more complex processes, which include: discriminating traffic on the basis of the protocol, for example ICMP (Internet Control Message Protocol), UDP (User Datagram Protocol), TCP (Transmission Control Protocol), etc; filtering based on source and destination ports of each packet; tracking the connection state to determine protocol violations; and the like. If needed, more sophisticated filtering may be done on the basis of the message content itself, so called “deep” packet inspection.
Intruders may attempt to transmit one or more specially crafted network packets designed to exploit a computer system vulnerability. A buffer overflow attack, for example, can create a condition where data is stored beyond the boundaries of a memory buffer, and adjacent memory locations are overwritten. This attack may be attempted using a network packet which is designed to exploit a flaw in the memory allocation strategy in the receiving computer system. The intruder may be able to cause the computer system to behave in an unintended way, or even run malicious code transmitted by the intruder.
In prior art firewall or intrusion detection systems, network packets may be inspected for predefined data patterns, with the goal of identifying anomalous network traffic, which may have been crafted by an intruder. This traffic could then be discarded by the firewall to prevent an attack, for example, before it would be processed by vulnerable computer software. This approach alone is inadequate, since intruders may be able to design an alternate form of the attack which has the desired effect without containing the data pattern the firewall is looking for. Depending on the protocol, the attacker may also be able to encode the network traffic so that the desired payload is carried in a way which evades firewall detection. Further, the firewall may find occurrences of the predefined data patterns which are contained in legitimate network traffic (so called “false positives”).
In some protocols, entire classes of attack types cannot be detected by prior art intrusion detection systems that use a simple search for a predefined data pattern. For example, it is common for compression or encoding schemes to be used to reduce the bandwidth required to transmit information in some protocols. In the DNS (Domain Name System), for example, domain names may be compressed using a specialized scheme described in Request for Comments (RFC) 1035 of Internet Engineering Task Force (IETF) written by P Mockapetris entitled “Domain Names—Implementation and Specification”, November 1987. A simple search for a data pattern which may be indicative of malicious network traffic may not succeed when traffic is compressed or encoded. As yet another example, in HTTP (Hypertext Transfer Protocol), GNU Zip compression as described in RFC1952 of Network Working Group of IETF, written by Peter Deutsch entitled “GZIP file format specification version 4.3”, May 1996, may be applied to content before transmission. Other content encodings and compression techniques in HTTP and many other protocols are also possible.
To formally specify the formats and data structures used in various protocols, a number of languages have been developed, including ASN.1 (Abstract Syntax Notation One) and IDL (Interface Definition Language). These languages are typically used to precisely describe the syntax of various protocol data units (PDUs) in a way that is independent of the software language used to process the PDUs for transmission or reception. Such “definition languages” are suitable for defining protocol interfaces, but they are not “computer languages” such as C, C++, or Java and they do not contain the constructs necessary for writing an actual computer program.
Accordingly, there is a need for an improved method and system for dynamic protocol decoding and analysis, to allow the detection and prevention of malicious traffic that overcomes the drawbacks of the prior art.