§1.1 Field of the Invention
The present invention concerns middlebox traversal in a network such as a data center network. More specifically, the present invention concerns dynamic provisioning of middleboxes.
§1.2 Background Information
Data Center Networks (DCNs) are used to host an increasing variety of applications and services, and are growing to tens of thousands of machines. Middleboxes are used to provide services such as traffic monitoring, traffic engineering, traffic policing, network and system security enforcements, etc., in DCNs. Together with the booming market of cloud computing, there is a need for high performance, highly scalable and dynamic middlebox provisioning. While recent advances in DCN architecture address many issues such as scalability, latency, etc., a truly dynamic yet network-forwarding independent middlebox traversal platform does not yet exist.
Middlebox traversal is an important part of the DCN infrastructure. Traditionally, middleboxes are deployed “in-path” at network borders, such as at a gateway to the Internet or at the edge of a subnet, so that the middleboxes are always traversed. The increasing variety in DCN designs and host applications, however, make correct, scalable, flexible and resource efficient middlebox traversal a challenge.
Data centers have been growing constantly, reaching hundreds of thousands of servers in a single facility. (See, e.g., L. A. Barroso and U. Holzle, “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,” http://research.google.com/pubs/pub35290.html, (2009) (Accessed January 2010); J. Dean, “Designs, Lessons and Advice from Building Large Distributed Systems,” http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, (2009) (Accessed January 2010); and T. Jaeger and J. Schiffman, “Outlook: Cloudy with a Chance of Security Challenges and Improvements,” Security Privacy, IEEE, 8(1):77-80, (January-February 2010), each incorporated herein by reference.) It may be a challenge to scale up the middlebox system to keep up with the growth. Middleboxes at perimeters, or any small number of clusters, may experience a bottleneck as traffic converges at them. This is especially true with the emergence of cloud computing and cloud-based virtual desktop services.
A variety of applications from different clients introduces different demands. Instances of virtual machines (VMs) from different clients are hosted on a physically connected network, and, in some cases, the same physical machines. The notion of internal network and perimeter defense may no longer apply. (See, e.g., T. Jaeger and J. Schiffman, “Outlook: Cloudy with a Chance of Security Challenges and Improvements,” Security Privacy, IEEE, 8(1):77-80, (January-February 2010), incorporated herein by reference.) Also, VMs are often migrated and care is needed when migrating their traffic and their security settings. (See, e.g., F. Hao, T. V. Lakshman, S. Mukherjee, and H. Song, “Secure Cloud Computing with a Virtualized Network Infrastructure,” Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 16-16, Berkeley, Calif., USA, USENIX Association, (2010); T. Jaeger and J. Schiffman, “Outlook: Cloudy with a Chance of Security Challenges and Improvements,” Security Privacy, IEEE, 8(1):77-80, (January-February 2010); and V. Soundararajan and J. M. Anderson, “The Impact of Management Operations on the Virtualized Datacenter,” Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 326-337, New York, N.Y., USA, ACM 9, (2010), each incorporated herein by reference.)
However, one of the main concerns that enterprises may have are how various security and monitoring may be reliably ensured in a shared infrastructure. (See, e.g., Express Computer, “Cloud Computing Adoption Seeing Acceleration in Asia Pacific,” http://www.expresscomputeronline.com/20110110/news02.shtml, (January, 2011) (Accessed January, 2011); T. Jaeger and J. Schiffman, “Outlook: Cloudy with a Chance of Security Challenges and Improvements,” Security Privacy, IEEE, 8(1):77-80, (January-February 2010); and Loudhouse Research, Cloud barometer survey 2010, (July 2010); and Microsoft, “Securing Microsoft's Cloud Infrastructure,” http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf, (May 2009) (Accessed January 2010), each incorporated herein by reference.)
In traditional DCNs, middleboxes composed of specialized network appliances are often deployed in a few clusters between the Internet gateways and servers. (See e.g., Cisco Systems, Inc., “Cisco Data Center Infrastructure 2.5 Design Guide,” http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_Infra2—5/DCI_SRND.pdf, (March 2010); and Juniper Networks, Inc., “Cloud-Ready Data Center Reference Architecture, http://www.juniper.net/us/en/local/pdf/reference-architectures/8030001-en.pdf, (2010) (Accessed January 2010), both incorporated herein by reference.) The design is mainly to protect servers from external adversaries, which is the main current threat. However, as the perimeter fades with the introduction of server virtualization, network routing and forwarding may be tweaked to force intra data center traffic through middleboxes. For example, Virtual Local Area Network (VLAN) are widely used to partition the network into security domains (See, e.g., T. Jaeger and J. Schiffman, “Outlook: Cloudy with a Chance of Security Challenges and Improvements,” Security Privacy, IEEE, 8(1):77-80, (January-February 2010); Loudhouse Research, Cloud Barometer Survey 2010, (July 2010); and Microsoft, “Securing Microsoft's Cloud Infrastructure,” http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf, (May 2009) (Accessed January 2010), each incorporated herein by reference.) such that traffic between domains are forced to traverse through all those middleboxes. The heavy reliance on custom configured network forwarding to provide middlebox traversal has serious drawbacks. Routing and forwarding configuration alone is already complex. (See, e.g., F. Le, S. Lee, T. Wong, H. S. Kim, and D. Newcomb, “Detecting Network-Wide and Router-Specific Misconfigurations Through Data Mining,” IEEE/ACM Trans. Netw., 17:66-79, (February 2009), incorporated herein by reference.) Adding security may make the configuration even more error prone. The complexity of configuration management is cited by the industry (See, e.g. Cisco, Configuration management, “Best Practices White Paper,” http://www.cisco.com/application/pdf/paws/15111/configmgmt.pdf, (March 2007) (Accessed January 2010); and Cisco, “Network Configuration Management,” http://www.cisco.com/en/US/technologies/tk869/tk769/technologies_white_paper0900aecd806c0d88.pdf, (September 2007) (Accessed January 2010), each incorporated herein by reference.) and there are specialized configuration auditing and management services. (See, e.g., Pivot Point Security, “Firewall and Router Configuration Review,” http://www.pivotpointsecurity.com/network-security-services/-firewall---router-configuration-reviews/, (Accessed: January 2010), incorporated herein by reference.) Also, security requirements may change on short notice, in both capacity and functionality. For instance, a denial of service (DoS) attack may cause the need for a new DoS filtering middlebox(es) and a surge in packet classifier capacity. Clusters of hardware lack the flexibility to respond and have a natural bottleneck of network scalability.
There are quite a number of recent proposals aimed at addressing the middlebox traversal issue. (See, e.g., N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker, “NOX: Towards an Operating System for Networks,” SIGCOMM Comput. Commun. Rev., (2008); F. Hao, T. V. Lakshman, S. Mukherjee, and H. Song, “Secure Cloud Computing with a Virtualized Network Infrastructure,” Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 16-16, Berkeley, Calif., USA, USENIX Association, (2010); D. A. Joseph, A. Tavakoli, and I. Stoica, “A Policy-Aware Switching Layer for Data Centers,” Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM '08, pages 51-62, New York, N.Y., USA, (2008); J. Lee, J. Tourrilhes, P. Sharma, and S. Banerjee, “No More Middlebox: Integrate Processing into Network,” Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM, SIGCOMM '10, pages 459-460, New York, N.Y., USA, (2010); and N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling Innovation in Campus Networks,” SIGCOMM Comput. Commun. Rev., (2008), each incorporated herein by reference.)
P-switch (See, e.g., D. A. Joseph, A. Tavakoli, and I. Stoica, “A Policy-Aware Switching Layer for Data Centers,” Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM '08, pages 51-62, New York, N.Y., USA, (2008), incorporated herein by reference.) introduces specialized switches that are connected to sets of middleboxes. While the P-switches are deployed in-path, the middleboxes are not. P-switches host a packet classifier to determine the sequence of middleboxes to be traversed. Packets are forwarded between a P-switch and those middleboxes directly connected to it in a zigzag manner according to the required traversal sequence. After all the required middleboxes are traversed, a packet continues its way along a normal data path. This way, middleboxes are indirectly connected to the data-path and packets are forwarded through the sequence of middleboxes deemed necessary by network policies. The P-switch provides many benefits.
Unfortunately, however, specialized switches are needed. Middleboxes deployment may still be partially limited to clusters of deployments at locations that have P-switches deployed. Unless wide-spread deployment of P-switches is realized, the full flexibility of deploying middleboxes anywhere in the network may not be achieved. Also, some network forwarding support may still be required. For instance, VLAN may need to be configured to force all inter-virtual machine (VM) traffic of different security domain to be out of a physical machine to be classified by the P-switch.
Proposals for next generation enterprise networks and DCNs (See, e.g., M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker, “Ethane: Taking Control of the Enterprise,” SIGCOMM '07: Proc. of the 2007 Conf on Applicat., Technol., Architectures, and Protocols for Comput. Commun., New York, N.Y., USA, (2007); A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan, and H. Zhang, “A Clean Slate 4D Approach to Network Control and Management,” SIGCOMM Comput. Commun. Rev., 35(5), (2005); N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker, “NOX: Towards an Operating System for Networks,” SIGCOMM Comput. Commun. Rev., (2008); and N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling Innovation in Campus Networks,” SIGCOMM Comput. Commun. Rev., (2008), each incorporated herein by reference.) advocate distributed enforcement of security policies. In particular, NOX (See, e.g., N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker, “NOX: Towards an Operating System for Networks,” SIGCOMM Comput. Commun. Rev., (2008).) consists of one or more controllers and a set of OpenFlow (See, e.g., N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling Innovation in Campus Networks,” SIGCOMM Comput. Commun. Rev., (2008), each incorporated herein by reference.) switches deployed in DCNs to provide flexible flow-based routing.
OpenFlow switches perform up to 11-tuple packet classification and can cache flow-based forwarding information. The NOX controller maintains the whole set of network policies and global network knowledge for routing and programming the forwarding table of OpenFlow switches. With the powerful packet classification features in OpenFlow switches, NOX may be configured to realize not only middlebox traversal, but also flexible middlebox deployments and many of the network forwarding optimizations such as multi-path routing. In fact, OpenFlow switches may be a fully functional agent as it provides both packet classification and header rewriting features. However, inter-VM traffic on the same machine may not be protected unless network forwarding tricks like VLAN separation is used. The fact that specialized switches are required may also be undesirable.
Two recent proposals (See, e.g., F. Hao, T. V. Lakshman, S. Mukherjee, and H. Song, “Secure Cloud Computing with a Virtualized Network Infrastructure,” Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 16-16, Berkeley, Calif., USA, USENIX Association, (2010); and J. Lee, J. Tourrilhes, P. Sharma, and S. Banerjee, “No More Middlebox: Integrate Processing into Network,” Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM, SIGCOMM '10, pages 459-460, New York, N.Y., USA, (2010).) use programmable switches, such as OpenFlow switches, to steer traffic to specific middleboxes. The article J. Lee, J. Tourrilhes, P. Sharma, and S. Banerjee, “No More Middlebox: Integrate Processing into Network,” Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM, SIGCOMM '10, pages 459-460, New York, N.Y., USA, (2010), middleboxes are connected to the programmable switches (Forwarding Element, or FE) similar to P-switch. VLAN are used to separate hosts of different security domains such that cross domain traffic is forced through FEs, where policies are enforced. A centralized controller is used in a similar manner as OpenFlow in that forwarding tables in FEs can be pre-populated while the misses cached after querying the centralized controller.
There are approaches based on source routing (See, e.g., Y. Chiba, Y. Shinohara, and H. Shimonishi, “Source Flow: Handling Millions of Flows on Flow-Based Nodes,” SIGCOMM Comput. Commun. Rev., 40:465-466, (August 2010); B. Raghavan, P. Verkaik, and A. C. Snoeren, “Secure and Policy-Compliant Source Routing,” IEEE/ACM Trans. Netw., 17:764-777, (June 2009); and J. Shafer, B. Stephens, M. Foss, S. Rixner, and A. L. Cox, “Axon: A Flexible Substrate for Source-Routed Ethernet,” Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS '10, pages 22:1-22:11, New York, N.Y., USA, (2010), each incorporated herein by reference.) similar with many of the above proposals in that packets are classified at the source or originating edge to determine the path. Source-based routing can be used to deploy the required middleboxes in-path. One important difference is the header size increases with the number of hops and middleboxes. Intermediate switches may have to be changed to support relaying based on the source routing header tags.
DCNs are special in that there are a variety of architectures tailored for specific data centers demands. There are reference designs by equipment vendors (See, e.g., Cisco Systems, Inc., “Cisco Data Center Infrastructure 2.5 Design Guide,” http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_Infra2—5/DCI_SRND.pdf, (March 2010); and Juniper Networks, Inc., “Cloud-Ready Data Center Reference Architecture, http://www.juniper.net/us/en/local/pdf/reference-architectures/8030001-en.pdf, (2010) (Accessed January 2010), both incorporated herein by reference.), new architectures proposed by academia (See, e.g. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: A Scalable and Flexible Data Center Network,” SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, pages 51-62, New York, N.Y., USA, (2009); and R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, “Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric,” SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, pages 39-50, New York, N.Y., USA, (2009)), custom design from major operator like Google (See, e.g., L. A. Barroso and U. Hölzle, “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,” http://research.google.com/pubs/pub35290.html, (2009) (Accessed January 2010), incorporated herein by reference.) etc. The existing middlebox traversal schemes are not independent from the network forwarding configuration and mechanisms. For example, configuration and changes in routing, load balancing, traffic engineering in network forwarding typically causes reconfiguration of the middlebox traversal system, and vice versa.
Recent literature on DCN architectures such as VL2 (See, e.g., A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: A Scalable and Flexible Data Center Network,” SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 51-62, New York, N.Y., USA, (2009)) and Portland (See, e.g., R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, “Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric,” SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, pages 39-50, New York, N.Y., USA, (2009).) often call for more at layer-2 topology. The emphasis may be on large bisectional bandwidth, improved network scalability, low latency, facilitation of VM migration, etc. However, a traditional centralized perimeter for security enforcement works against this design principle. Operators should be able to deploy multiple types and instances of middleboxes at any location in a network, to improve scalability of network resources through proximity (See, e.g., X. Meng, V. Pappas, and L. Zhang, “Improving the Scalability of Data Center Networks with Traffic-Aware Virtual Machine Placement,” Proceedings of the 29th Conference on Information Communications, INFOCOM'10, pages 1154-1162, Piscataway, N.J., USA, IEEE Press, (2010), incorporated herein by reference.), for example.
Churn in application type and network services may require rapid on-demand scaling of network services, including firewall, deep packet inspection (DPI), traffic engineering, load balancing, etc. Suppose a client deployed a new web service in a cloud-based data center and traffic had been low during development and evaluation. If the web service goes public and becomes well publicized, a sudden surge of traffic may demand additional firewall and DPI capacity. As more cloud instances are added, a load balancer may have to be added to the sequence of middlebox traversal. Unfortunately, the churn in the traffic loads has to be responded by enormous over-provisioning for a highly unpredictable demand, given the nature of cloud paradigm.
Operational costs for human intervention are very expensive. (See, e.g., M. Goldszmidt, M. Budiu, Y. Zhang, and M. Pechuk, “Toward Automatic Policy Refinement in Repair Services for Large Distributed Systems,” SIGOPS Oper. Syst. Rev., 44:47-51, (April 2010), incorporated herein by reference.) There are quite a number of day-to-day operations that may require some manual operations, such as changes in network policy and configuration, link reconfiguration, hardware installation etc. (See, e.g., J. Dean, “Designs, Lessons And Advice From Building Large Distributed Systems,” http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, (2009), (Accessed January 2010); and V. Soundararajan and J. M. Anderson, “The Impact of Management Operations on the Virtualized Datacenter,” Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 326-337, New York, N.Y., USA, ACM 9, (2010), both incorporated herein by reference.) With the scale of data center that may exceed tens of thousand of servers, switches and middleboxes, daily equipment failures are typical. (See, e.g., J. Dean, “Designs, Lessons And Advice From Building Large Distributed Systems,” http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, (2009), (Accessed January 2010).) Little margin for error may remain for other operations that could or must be automated. For instance, a data center with virtualization may have over 3000 automated live VM migrations per day. (See, e.g., V. Soundararajan and J. M. Anderson, “The Impact of Management Operations on the Virtualized Datacenter,” Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 326-337, New York, N.Y., USA, ACM 9, (2010), incorporated herein by reference.) Network services, including middlebox traversal, may migrate with them. Requiring manual operations to correctly and efficiently enforce middlebox traversal upon frequent and automated events may be either inefficient or impossible.
In view of the foregoing, it would be useful to provide a middlebox provisioning scheme that: (i) decouples network services and network forwarding; (ii) facilitates dynamic deployment of hybrid (hardware and software) middleboxes anywhere in the network; (iii) provides dynamic scalability; and/or (iv) allows a high degree of automation in managing and operating the middleboxes.