The present invention relates to a network and, more particularly, to a self-protected network.
Modern data center networks are constructed with advanced infrastructures and well-engineered protocols and are operated with extreme cautions. However, the fierce challenge of network failures is still unchanged, if not intensified, especially as an ever growing population of applications and services enter the era of Cloud. Amazon EC2, for instance, was largely down on Apr. 21, 2011 due to a routing misconfiguration that mistakenly rerouted high-volume external traffic into the low-capacity internal network [3]. As a consequence, thousands of businesses and websites were out of service and seventy-five million PlayStation gamers were affected [5]. Another recent example is the three-day blackout of Blackberry in October 2011 as a result of a core switch failure.
To mitigate these problems, a surge of recent efforts are currently underway and have demonstrated encouraging results. One school of people approach the problem by designing next-generation network infrastructures that can improve network bisection bandwidth and provide malleability in the network routing topology [5, 6, 8]. Another group of researchers focus on designing network transport protocol [1, 9] that is particularly fine tuned for data center networks. A last bunch of people seek to propose improved resource placement [2, 7] schemes that can optimize the network resource allocation. At the same time, however, we believe that the cascading catastrophic network failures are unlikely to be prevented without proper considerations of two fundamental assumptions that do not hold in data centers.
We design NetFuse, which is, analogously to the fuse boxes in the electrical circuits, a self-protection mechanism that seeks to detect and respond to a variety of network problems and protect the necessary network services. Specifically, NetFuse employs a multi-dimensional flow aggregation algorithm that automatically determines an optimal set of clustering criteria and identifies the suspicious flows that are likely to cause network problems under these criteria. Then, NetFuse adaptively regulates the identified flows according to the network feedback. Due to the light-weight sensing capabilities inherent in the OpenFlow technologies, NetFuse is currently implemented in OpenFlow networks as a proxy device between the switches and the controller. This way, NetFuse can not only intercept the control messages and infer the network states, but also offload the excessive processing overhead for the controller, thereby improving scalability of the entire system.
[1] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data Center TCP (DCTCP),” in Proc. ACM SIGCOMM, August 2010.
[2] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, “Towards Predictable Datacenter Networks,” in Proc. ACM SIGCOMM, August 2011.
[3] Why Amazon's cloud Titanic went down. [Online] Available: http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud outage/index.htm.
[4] G. Cormode, S. Muthukrishnan, and D. Srivastava, “Finding hierarchical heavy hitters in data streams,” in Proc. In Proc. of VLDB, 2003, pp. 464-475.
[5] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: a Scalable and Flexible Data Center Network,” in Proc. ACM SIGCOMM, August 2009.
[6] C. Guo, G. Lu, D. Li, H.Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “BCube: a High Performance, Server-Centric Network Architecture for Modular Data Centers,” in Proc. ACM SIGCOMM, August 2009.
[7] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang, “SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees,” in Proc. ACM CoNEXT, November 2010.
[8] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vandat, “PortLand: a Scalable Fault-Tolerant Layer 2 Data Center Network Fabric,” in Proc. ACM SIGCOMM, August 2009.
[9] V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Anderson, G. R. Ganger, G. A. Gibson, and B. Mueller, “Safe and Effective Fine-Grained TCP Retransmissions for Datacenter Communication,” in Proc. ACM SIGCOMM, August 2009.
[10] C. Wilson, H. Ballani, T. Karagiannis, and A. Rowstron, “Better Never than Late: Meeting Deadlines in Datacenter Networks,” in Proc. ACM SIGCOMM, August 2011.
[11] A. Wundsam, D. Levin, S. Seetharaman, and A. Feldmann, “OFRewind: Enabling Record and Replay Troubleshooting for Networks,” in Proc. USENIX Annual Technical Conference, June 2011.
[12] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, “Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications,” in Proc. Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ser. IMC '04.