1. Technical Field
The present invention relates to data analysis and, more particularly, to the detection of anomalous data transmissions.
2. Description of the Related Art
Network computer systems consist of processing sites (e.g., host computers) that exchange data with each other. There are various protocols used by computers to exchange data. For example, TCP/IP is one network protocol that provides the transport of data between computers that are connected by a network. Each host computer is assigned a unique internet protocol UP) address, and data is exchanged between source IP addresses and destination IP addresses to a destination port on the destination host and from a source port on the source host. A port number corresponds to a particular service or application that “listens” for data sent to it on that port from some remote source host. Some ports are standardized and assigned a typical well-known service. For example, web-based servers are typically assigned port 80 for transmission of web requests delivered via TCP/IP packets with control information according to the hypertext transfer protocol (HTTP) commands the web server expects. TCP/IP transfers such data in the form of “network packets” that consist of the identification of IP addresses, port numbers, control information, and payload. The payload is the actual data expected by the service or application. In the case of web traffic, payload can consist, for example, of GET requests for web pages represented by URL's.
As networks, such as the Internet, become more accessible to users, the amount of data transmitted significantly increases. This presents an opportunity for individuals to cause harm to the computers of unsuspecting users. Worms and viruses, in particular, are well known causes for security breaches in computer systems. These constitute malicious data sent to a service or application that exploits a vulnerability (such as a buffer overflow providing root access to the worm's executable program) that causes the service or application to be disabled, crash, or provide unauthorized privileges to an attacker. Some common examples include the recent Code Red, Nimda, and Sobig worms and viruses. Conventional systems designed to detect and defend systems from these malicious and intrusive events depend upon “signatures” or “thumbprints” that are developed by humans or by semi-automated means from known prior bad worms or viruses. Currently, systems are protected after a worm has been detected, and a signature has been developed and distributed to signature-based detectors, such as a virus scanner or a firewall rule.
In order to reduce the potential threat of attacks, a firewall is established to protect computers within a network. Firewalls are computer systems that typically stand at the gateway of a computer network or that reside on a network in front of a critical host or server computer, and which inspect the traffic to and from the network or server, and determine which traffic may proceed, and which traffic will be filtered. Firewalls can also be implemented in the form of software on individual computers. As an example, propagating worms are typically filtered by firewalls that have been preloaded with a “signature rule” that detects the appearance of a specific worm. When a packet and its payload “matches” a known signature string associated with a worm, the firewall would block the TCP/IP packets that delivered the worm, preventing the server from being attacked by that worm.
This approach suffers two fundamental problems. First, the signature strings associated with worms can only be constructed after the worm has been detected. This means the worm was actually not detected on its first appearance, and logically attacked at least one server, causing damage to the server. Protection is not possible until a third party has constructed a signature string and deployed it broadly to all network sites and firewalls. Precious time can be lost during this process, which can typically require many days. During this time, the worm would have successfully spread widely throughout the internet, damaging many thousands if not millions of hosts. This is because worms in particular propagate rapidly on the Internet and infect and destroy systems at very high speeds. Second, there are very many worms that have appeared on the Internet, and each of these have had distinct signature strings constructed for their detection which are each loaded into all of the firewalls. This implies that over time firewalls must grow in complexity in order to store, process, and match many signature strings to each packet payload delivered to the gateway or server.
Various attempts have been made to detect worms by analyzing the rate of scanning and probing from external sources which would indicate a worm propagation is underway. Unfortunately, this approach detects the early onset of a propagation, and by definition, the worm has already successfully penetrated a system, infected it, and started its damage and propagation.
Based on the foregoing, it would be beneficial to provide a system capable of detecting potentially harmful data being transmitted through a network. It would also be beneficial to provide a system capable of determining whether potentially harmful data is a malicious program. It would be further beneficial to provide signatures to filter malicious programs such as worms and viruses upon an initial appearance of such programs.