Packet-based communications have gained wide acceptance in modern society. With such acceptance has come increasing use in commercial transactions as well as an increase in reliance on such communications. High levels of service availability and high degrees of reliability are absolutely necessary. Any compromise of service causes not only severe inconvenience, but in many cases, severe financial loss to one or more effected parties. Modern communications makes distribution of information extremely easy. This is desirable in many circumstances. However, there are situations where unauthorized dissemination, for example of unauthorized copies of copyrighted materials, is undesirable and even criminal. Consequently, the need to protect the network from certain undesirable or harmful traffic flows and the need to identify sources of illegal content create attendant needs for techniques to identify or traceback sources of packet transmissions.
For example, Distributed-Denial-of-service (DDoS) attacks are growing threats to today's Internet. With the availability of automatic attacking tools such as Tribe Flood Network (TFN), TFK2K, Triboo and Stacheldraht, any person with substantial knowledge about networking can easily carry out a DDoS attack. Some statistics show that DoS (denial of service) and DDoS attacks are so prevalent nowadays that they present a great threat to e-business. Targets of attacks have included even the most recognizable corporations, the White House and the CERT (Computer Emergency Response Team) itself. A single DDoS attack in 2000 is believed to have cost hundreds of million of dollars. In other words, using DDoS attack tools, any person can, within seconds and at no cost, cause millions of dollars of loss.
Because of the damage that such attacks incur on the Internet and on the business of some online companies that profit directly or indirectly from their devoted subscribers or users (Amazon.com, Buy.com, eBay, etc.), there is an immediate need for a real-time mechanism for tracking down the sources of these attacks, to stop attacks as early as possible and to deter future attacks. Determining the source of the attack is, however, not an easy task since attackers use incorrect or spoofed IP addresses. IP address spoofing disguises the true sources of an attack and, more specifically, can make the attacks appear as if they are being carried out by innocent networks and end-systems.
As another example, the illegal exchange of copyrighted material remains an enormous problem in the Internet. Such copyright infringements are facilitated by pervasive peer-to-peer networks which are basically distributed databases. Many such databases contain vast amounts of copyrighted material illegally obtained, possessed and re-distributed. Notwithstanding the past punitive action taken against Napster, the first successful “third party” file distribution system, transmission of copyrighted material still constitutes a significant fraction of the Internet's total traffic burden and accounts for significant lost royalty revenues by copyright holders.
Currently, illegal exchanges of copyrighted material are “openly” negotiated and transferred typically using FTP (file transfer protocol) or HTTP (HyperText Transfer Protocol) over TCP, i.e., the sender's identity is known during the transfer of the file. Once illegal possession and distribution of copyrighted material begins to be actively investigated, however, we expect that transfer of copyrighted material will become more anonymous to prevent traceback to (identification of) the unlawful transmitters of this material into the network (senders). More specifically, we expect that parties involved in such activities, who are implicitly aware of each other's true identities, will typically negotiate an illegal exchange of copyrighted material in the future. For example, the negotiation could be a simple request using the peer-to-peer protocol that is also used to exchange the file itself or through separate email, or even telephone, exchanges. The sending party will, however, then use spoofed source IP addresses in the packets making up the UDP (user datagram protocol) transfer of copyrighted material to the receiving party. That is, the transfer itself is made anonymous in this fashion to protect the unlawful sender's identity in the network layer.
As for the handling of denial of service attacks, illegal content distribution presents (or will soon present) a problem of identifying the source of illegally distributed copyrighted material. To address this problem, we must first assume that a mechanism is in place in the network that can detect flows of illegally copyrighted material through the network. Once discovered, the network can attempt to trace back to the source of the flow (assuming this problem is made nontrivial by the use of spoofed source IP addresses by the unlawful sender). In a similar fashion, traceback of distributed denial-of-service attacks (DDoS) assumes an intrusion detection system (IDS) at the victim end-system has identified that an attack is occurring and which packets are participating it. Alternatively, we can consider a situation where a hoard of illegally possessed copyrighted material is discovered and, based on (feasibly sized) logs of recent internetworking activity of the apprehended user, the sources of the copyrighted material are identified.
Internet users and service providers also must contend with a variety of other insidious activities, for which mitigation strategies would benefit from using an effective traceback scheme. For example, an increasing amount of unwanted e-mails, often referred to as “spam” e-mails, is bombarding most e-mail users. In some cases, the volume of spam e-mails is becoming so large that the service providers' equipment for message storage is becoming overloaded. Often, the high-level source address in the message (Domain Name or the like) is a fake, and the spammers may use IP address spoofing to hide their physical network location. Blocking of spam e-mails would benefit from knowledge of the true source, but any traceback must be effective to circumvent or reduce the effects of IP address spoofing.
To understand the problems and inadequacies of prior approaches it may be helpful to focus on tracing sources of attacks. To mitigate or terminate a DDoS attack, a victim end-system must address the following component problems: determining which incoming packets are part of the attack (intrusion detection), tracing back to find the origins of the attack (traceback) and, finally, taking action to mitigate or stop the attack (at the identified source) by configuring firewalls or taking some other kind of punitive measures. Determining the source of an attack is not, however, a simple task since attackers typically use incorrect or spoofed IP addresses. IP address spoofing can create the appearance that the attacks are being carried out by innocent end-systems. For these reasons, several solutions have recently been proposed to automatically traceback the sources of DDoS attacks and mitigate them. Each proposed approach has, however, certain drawbacks. General criteria for evaluation of traceback techniques include: false positive rates (including those maliciously caused), missed detection rates, computation and communication overhead, deployment complexity, and DoS effects of the firewalls configured as a result of traceback.
One suggested approach to traceback is Link Testing: Input Debugging. In this method, the victim reports an attack to its upstream router, which in response installs a debugging filter that reveals which upstream router originated the attacking traffic. The method is repeated recursively until the ISPs' border routers are reached. However, this method requires a tremendous amount of management overhead and relies on the availability and willingness of the network operators. While such tracing may be done manually, many ISPs have built tools to automate tracing of attacks across their own networks.
Another suggested approach is Link Testing: Controlled Flooding. In controlled flooding, the victim forces selected hosts to flood one by one each incoming link of the router closest to the victim. The victim monitors the change in the attack packet rate and determines from which link the attack is arriving. The method is repeated recursively until an ISPs' borders router are reached. Obviously, this method is not attractive since it can be a form of DoS itself. In addition, the victim needs to have a good, recent and detailed map of the network topology and be able to generate large packet floods on arbitrary network links without causing a form of DoS attack on the links.
By contrast, under the Internet Control Message Protocol (ICMP) Traceback Messages scheme, routers, with low probability, generate a Traceback message (carried in an ICMP packet) that is sent along to the victim. With sufficient number of Traceback messages from enough routers on the path, the victim is able to reconstruct the attack path. The Traceback messages can help to identify the message generator, the link that the traced packet arrived from, or the link it was forwarded on. The Traceback messages provide information on link IDs by including one or more of the following: Router interface identifier on which the packet was received from or forwarded on, IP addresses of the two routers that form the link, MAC address of the two routers that form the link, and Operator-defined link identifier.
The ICMP Traceback Messages scheme appears promising and can be effectively used to traceback the source of an attack or to construct a map of the Internet. Unfortunately, this technique has two drawbacks. One drawback is that this technique generates extra traffic, leading to a less available bandwidth for data traffic. The other drawback is that an authentication problem arises from the fact that attackers may generate fake Traceback messages to hide the source of the real attack traffic, thereby creating another form of attack.
Another proposed traceback technique involves route-based distributed packet filtering (RBDPF). One basic assumption for RBDPF is that ingress filtering is successful in preventing, not just detecting, DDoS attacks. Ingress filtering is not implementable at every gateway, but is possible for coverage of backbone routers. This technique also relies on an assumption that Internet autonomous system (AS) topologies exhibit power-law connectivity. Thus coverage of a small percentage of backbone routers will cover most of Internet traffic.
RBDPF implements ingress filtering on certain critical border routers of the Internet. It functions by analyzing the routes packets used that are abnormal from the routes they commonly would follow. They then can begin to determine which are legitimate transmission control protocol (TCP) requests and which are fake. The strength of RBDPF lies in its ability to trace back to systems spoofing source IP addresses.
The drawback of this scheme is that some fake TCP requests will still flow through, i.e., it is not 100% effective. Moreover, since it is based on ingress filtering, RBDPF suffers from the same weaknesses. An attacker can still spoof source IP addresses from within the range allowed by the provider of Internet connectivity. Moreover, ingress filtering implementation is opposed by some ISPs, especially the larger high-speed providers. Packet filtering increases CPU utilization and measurably lowers throughput leading to potential performance degradation.
The Probabilistic Packet Marking (PPM) scheme requires that a router, with specified probability, inscribes its local path information into the packet header. The victim reconstructs the attack path starting from the packets received from the closest routers moving up to the ISPs' border routers. Two prominent varieties have been proposed: the Fragment Marking Scheme (FMS) and the Advanced Marking Scheme (AMS).
Under FMS, each router's IP address is bit interleaved with its hash value. The resulting 64-bit quantity is split into eight fragments. Each router probabilistically marks an IP packet it forwards with one of the eight fragments. In case of a DDoS attack, the scheme suffers from two drawbacks. One drawback is the high computation overhead, because of the large number of combinations that need to be checked to reconstruct the routers' IP address. Another drawback of this technique is that it produces a large numbers of false positives, because the incorporation of only the hash value of the routers' IP address and because false positives at a closer distance will result in more false positives further away from the victim (where the border routers are).
Under AMS, each router's IP address is hashed into an 11-bit or 8-bit value (according to whether AMS version I or II is used) and probabilistically inscribed in forwarded IP packets. This scheme reconstructs the attack path of hundreds of attackers with few false positives and a small amount of time. However, the major drawback of AMS is the required knowledge of a topological map of the Internet to be able to reconstruct a 32-bit router IP address from the 11-bit or 8-bit hash values.
A major problem with all the probabilistic marking schemes (including FMS and AMS) is that they do not prevent DDoS attacks in real time, i.e., while an attack is on-going. Also, such schemes require up to thousands of packets per attacker to be able to reconstruct the attack path.
Since routers mark in a probabilistic way, the victim will receive many unmarked packets. As the victim is not able to differentiate between these packets and genuine marked packets, an attacker can easily take advantage of that flaw by inserting “fake” links and “fake” distances into the identification fields. They negatively impact users that require fragmented IP datagrams. When a datagram is fragmented, its identification field is copied to each fragment so the receiver can reassemble the fragments into the original datagram. However, a marking router may overwrite this identification field value and hence cause the fragments not to be reassembled. Moreover, a router may mark fragments from different datagrams with the same identification field value causing incorrect re-assembly.
In view of the noted drawbacks, many DDoS attacks remain possible today because no serious preventative measures have been deployed to mitigate them (for instance, prevention against SYN Flood Attack). For this reason, a tool is needed to trace back the origin of an attack and stop it at its source.
As noted, related needs for tracing packet transmissions to their sources arise in other contexts, for example to combat spamming and to trace sources of illegal copies of protected materials. Hence, the traceback tool developed should be readily adaptable to identifying sources of packet flows relating to a variety of different kinds of problems.