§1.1 Field of the Invention
The present invention concerns detecting a relay node in a network. In particular, the present invention concerns determining whether or not a node in a network is a relay node.
§1.2 Background Information
In a typical network, there can be various situations where a node receives data from another node and forwards it to some others. Such nodes are often called “relay nodes”. Relay nodes can be employed for many different purposes which can be either legitimate or malicious. For instance, an enterprise may be running a legitimate peer-to-peer video streaming application for the benefit of its employees. On the other hand, an employee could be violating company policy by running a peer-to-peer application to watch live television on the desktop. Similarly, a system administrator may be connected from home to a server and logs in from that server using SSH to one of the internal machines in order to check its status. This would be an example of a legitimate stepping stone connection. Stepping stones, however, are commonly used by hackers to make attack traceback difficult.
Regardless of the original intention, relay nodes are a potential threat to networks since they are used in various malicious situations like stepping stone attacks, botnet communication, illegal peer-to-Deer file sharing or streaming etc. Hence, quick and accurate detection of relay nodes in a network can significantly improve security policy enforcement. There has been significant work done and novel solutions proposed for the problem of identifying relay flows active within a node in the network.
Relay nodes can be divided into two main categories—Store Forward and Delay Constrained. Each is introduced below.
Store & Forward relay nodes often store data before forwarding. Peer to peer file sharing and email relaying are some examples of store and forward relays. The time elapsed before the data is forwarded depends on application requirements. For instance email relays forward received emails after few minutes, whereas in peer to peer file sharing applications data is forwarded only when another user requests it. Delay-Constrained relay nodes forward the received data within a maximum tolerable delay constraint. The delay requirement is inherent in the underlying application. Delay-constrained relaying can be done by applications which are either interactive or machine driven. For instance, stepping stones and IM message routing nodes are some examples of delay-Constrained relays with interactive sessions. On the other hand, peer to peer live broadcast and Skype super-nodes are examples of machine-driven delay-constrained relays.
Detection of store and forward type of relays is generally done by identifying protocol features. Usually a target protocol is selected and its distinctive characteristics are identified. Subsequently, a node that exhibits such characteristics is declared as a relay node. Some such protocol features used by researchers include connections to known ports, some specific signatures in the payload, concurrent use of both UDP and TCP etc. This work will mainly focus on delay-constrained relays. Further details on store and forward relay node detection schemes are described in the articles A. Gerber, J. Houle, H. Nguyen, M. Roughan, and S. Sen, “P2p the gorilla in the cable,” National Cable and Telecommunications Association (NCTA) 2003 National Show, Chicago, Ill., Jun. 2003, T. Karagianis, A. Broido, M. Faloutsos, and K. Claffy, “Transport layer identification of p2p traffic,” Proc. 4th ACM SIGCOMM Conf. on Internet Measurement, Taorminia, Sicily, Italy, October 2004, R. Meent and A. Pras, “Assessing unknown network traffic,” CTIT Technical Report 04-11, University of Twente, Netherlands, February 2004, S. Ohzahata, Y. Hagiwara, M. Terada, and K. Kawashima, “A traffic identification method and evaluations for a pure p2p application”, Lecture Notes in Computer Science, volume 3431, 2005, and S. Sen, O, Spatscheck, and D. Wang “Accurate, scalable in-network identification of p2p traffic using application signatures” Proc, 13th Int. Conf. on World Wide Web, NY, 2004.
Prior work that focuses on the delay-constrained relay node detection problem has been limited. Although, there has been some work on the delay-constrained relay flow detection problem such as articles by D. S. A. Blum and S. Venkataraman, “Detection of interactive stepping stones: Algorithms and confidence bounds,” Conference of Recent Advance in Intrusion Detection (RAID), Sophia Antipolis, French Riviera, France, September 2004, D. Donoho, A. G. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford, “Multiscale stepping-stone detection: Detecting pairs of jittered interactive streams by exploiting maximum tolerable delay,” Fifth International Symposium on Recent Advances in Intrusion Detection, Lecture Notes in Computer Science 2516, New York, Springer, 2002, S. Staniford-Chen and L. Heberlein, “Holding intruders accountable on the internet,” Proc. IEEE Symposium on Security and Privacy, Oakland, Calif., page 3949, May 1995, K. Yoda and H. Etoh, “Finding a connection chain for tracing intruders,” F. Guppens, Y. Deswarte, D. (Collamaun, and M. Waidner, editors, 6th European Symposisum on Research in Computer Secrit—ESORICS 2000 LNCS—195, Toulouse, France, October 2000, and Y. Zhang and V. Paxson, “Detecting stepping stones,” Proceedings of the 9th USENIX Security Symposium, pages 171, 184, August 2000, detection of relay flows implies identification of relay nodes has been difficult. The basic detection methodology in the proposed delay constrained relay flow detection schemes is to search for network flow pairs which exhibit strong mutual correlation. This correlation is determined based on various attributes, of the flow, including packet content (payload), packet arrival times, packet lengths, etc. Regardless of how the correlation is determined, all these methods compare each incoming flow to each outgoing one. Therefore they require quadratic time, which may be prohibitive for medium to large scale networks with tens of thousands of nodes and thousands of active connections in many nodes.
§1.2.1 Previous Approaches and Perceived Limitations of Such Approaches
Research on relay detection such as delay constrained relay nodes has mostly focused on stepping stones due to their obvious potential malicious intention. (See, e.g., S. Staniford-Chen and L. Heberlein, “Holding intruders accountable on the internet,” Proc. IEEE Symposium on Security and Privacy, Oakland, Calif., page 3949, May 1995.) The Staniford-Chen article propose a content correlation based scheme where flow pairs are compared in terms of thumb-prints of their content. Although they achieve good performance, content based schemes have limited applicability since flows are usually encrypted and their contents are inaccessible. This fact motivated researchers to focus on layer 3 information which mostly consists of originating and destination IP addresses, originating and destination port numbers, layer 3 protocol type (TCP or UDP), packet arrival times and packet lengths. In the first work that incorporates layer 3 information. Zhang and Paxson in their article (Y. Zhang and V Paxson, “Detecting stepping stones,” Proceedings of the 9th USENIX Security Symposium, page 171, 184, August 2000) discuss detecting stepping stones by correlating flows in terms of their ON and OFF periods. The assumption is that correlated flows switch from OFF state to ON state at similar times. In an article, K. Yoda and H. Etoh, “Finding a connection chain for tracing intruders,” F. Guppens, Y. Deswarte, D. Gollamann, and M. Waidner, editors, 6th European Symposium on Research in Computer Security—ESORICS 2000 LNCS—1985, Toulouse, France, October 2000, Yoda and Etoh propose a similar timing based algorithm where correlation is defined over sequence number vs. time curves of the flows. Another timing based algorithm is proposed by He and Tong in their article, T. He and L. Tong, “A signal processing perspective of stepping-stone detection,” Proc. of IEEE CISS '06, Princeton, N.J., 2006, where authors formulate the flow correlation problem as a nonparametric hypothesis testing. Other than stepping stones, in an article, K. Sulh, D. Figueiredo, J. Kurose, and D. Towsley, “Characterizing and detecting skype-relayed traffic,” Proc. of Infocom, 2006, Suh et. al. proposed a similar timing based technique for detecting Skype related relay traffic.
Although timing based methods perform quite well, unfortunately they fall short when the time structure of the relaying flows is perturbed by an attacker. This perturbation may be performed by means of introducing artificial delays before relaying the received packet or by adding chaff packets into the stream. In an article, D. Donoho, A. G. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford, “Multiscale stepping-stone detection: Detecting pairs of jittered interactive streams by exploiting maximum tolerable delay,” Fifth International Symposium on Recent Advances in Intrusion Defection, Lecture Notes in Computer Science 2516, New York, Springer, 2002, Donoho et. al. shows that if there is a maximum tolerable delay constraint, instead of using raw packet timing information, applying wavelet decomposition and analyzing packet timings in different (lower) resolutions will make the effect of the adversarial changes in time structure insignificant. Similarly under a maximum tolerable delay constraint, Blum et. al. present confidence bounds on the stepping stone detection problem in an article, D. S. A. Blum and S. Venkataraman, “Detection of interactive stepping stones: Algorithms and confidence bounds,” Conference of Recent Advance in Intrusion Detection (RAID), Sophia Antipolis, French Riviera, France, September 2004. As a completely different approach, in the article, X. Wang and D. S. Reeves, “Robust correlation of encrypted attack traffic through stepping stones by manipulation of interpacket delays” CCS '03: Proceedings of the 10th ACM conference on Computer and communications security, pages 20-29, 2003, Wang and Reeves propose a watermarking based approach where selected packet timings are slightly adjusted on all incoming flows. In order to identify a relaying flow, a watermark detection procedure has is applied to all outgoing flows. Although this technique is a form of timing based flow correlation algorithm, it is shown to be robust against timing perturbations introduced by adversaries.
As previously noted, flow correlation based techniques solve the problem in quadratic time. They need to compare each incoming flow to each outgoing flow. Therefore it is not easy to employ these methods in large networks. One could adopt filtering techniques to alleviate this problem to some extent. For instance, in the article, Y. Zhang and V. Paxson, “Detecting stepping stones,” Proceedings of the 9th USENIX Security Symposium, page 171184, August 2000, specific flow pairs are filtered out based on packet size, inconsistent source and destination ports, inconsistent packet direction, inconsistent packet timing etc. However, discarding information usually brings a potential threat to detection performance since the real relaying flows could be filtered out or adversaries could manipulate flow characteristics to get filtered out. Therefore, a more scalable solution for relay detection problem would be of potential value in many situations.