1.1 Field of the Invention
The present invention concerns load balancing in a network, using flow based routing such as a data center network.
1.2 Background Information
The purpose of load balancing in communication networks is to route traffic across multiple paths in an effective way so that the load on the network links and/or nodes are evenly distributed. In practice, to design and evaluate load balancing, the links are considered. Typically, routing in an autonomous system is based on shortest path algorithms, e.g., open shortest path first. (See, e.g., J. Moy, “OSPF Version 2,” RFC 2328 (Standard), (April 1998), incorporated herein by reference.). Without load balancing over multiple paths, the shortest path from a source to a destination is calculated in advance, and all the traffic from the source to the destination is directed through this shortest path.
Data center networks often use densely interconnected topologies to provide large bandwidth for internal data exchange. In such networks, effective load balancing schemes are employed to use the bandwidth resources fully. For example, fat-tree and Clos networks are widely adopted where a large number of paths exist between each pair of nodes. (See, e.g., A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: A Scalable And Flexible Data Center Network,” SIGCOMM '09: Proceedings of the ACM SIGCOMM Conference on Data Communication, New York, N.Y., pages 51-62, (2009); and R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vandat. PortLand, “A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric,” SIGCOMM '09: Proceedings of The ACM SIGCOMM 2009 Conference on Data Communication, pages 39-50, New York, N.Y., USA, (2009), both incorporated herein by reference.) The proposed data center network topologies including DCell (See, e.g., C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. “DCell: A Scalable And Fault-Tolerant Network Structure For Data Centers,” SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, pages 75-86, New York, N.Y., USA, (2008), incorporated herein by reference.), BCube (See, e.g., C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu., “BCube: A High Performance, Server-Centric Network Architecture For Modular Data Centers,” SIGCOMM '09: Proceedings Of The ACM SIGCOMM 2009 Conference On Data Communication, pages 63-74, New York, N.Y., USA, (2009), incorporated herein by reference.), and DPillar (See e.g., Y. Liao, D. Yin, and L. Gao, “DPillar: Scalable Dual-Port Server Interconnection for Data Center Networks,” IEEE ICCCN, (2010), incorporated herein by reference.), all feature of dense interconnections. In these types of networks, using single-path routing without load balancing cannot utilize the network capacity fully. As a result, network congestion may occur even if the network has abundant unused bandwidth.
The foregoing problem is illustrated referring to FIG. 1. If two shortest paths A-E-F-D and G-E-F-J are selected for single-path routing between source host 110 and destination host 115, link E-F may be overloaded even if paths A-B-C-D and G-H-I-J have unused bandwidth. This problem may be alleviated using equal-cost multi-path (ECMP) routing. (See, for e.g., C. Hopps, “Analysis of An Equal-Cost Multi-Path Algorithm,” RFC 2992 (Informational), (November 2000), incorporated herein by reference.) With ECMP, multiple shortest paths are calculated from a source to a destination, and traffic is distributed across these equal-cost paths to achieve load balancing. In FIG. 1, if both A-E-F-D and A-B-C-D are used to carry traffic from A to D, and both G-E-F-J and G-H-I-J are used to carry traffic from G to J, network utilization may be greatly improved. With ECMP, each router may have multiple output ports, which lead to multiple paths, for the same destination prefix. More specifically, when a packet arrives, the router calculates a hash value based on the packet header and selects one of the feasible output ports based on the hash value. It is typically a common practice to use the 5-tuple header fields (that is, source Internet Protocol (IP) address, destination IP address, protocol type, source port, and a destination port) to calculate the hash value. With this approach, packets belonging to the same flow follow the same path, thus avoiding out-of-sequence delivery. However, using ECMP cannot guarantee good load balancing for at least two reasons.
First, hash based traffic distribution is per-flow based, not per-packet based. Thus, the result is to balance the number of flows on different paths, but this does not necessarily balance the bit rates. More specifically, even if two paths carry the same number of flows, the traffic loads may not be equal since the flows have different bit rates. (See, e.g., M. Al-fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vandat, “Hedera: Dynamic Flow Scheduling for Data Center Networks,” Proc. of Networked Systems Design and Implementation (NSDI) Symposium, (2010), incorporated herein by reference.) Second, from the network-wide viewpoint, using ECMP may still lead to overload on certain links. Referring back to FIG. 1, if A-D and G-J evenly each distribute their traffic between the two paths mentioned, the load on link E-F would still be twice of the load on any other links.
One may consider adjusting the hash function in a sophisticated way to achieve network wide load balancing. Unfortunately, this may not be feasible because the traffic fluctuates all the time and route recalculation occurs each time there is a topology change. Therefore, tuning hash functions may barely follow such dynamic changes, even if the considerable complexity could be handled.
A common approach to solve the problems of ECMP is flow-based routing. OpenFlow (See, e.g., N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “OpenFlow: Enabling Innovation in Campus Networks,” SIGCOMM Comput. Commun. Rev., 38(2): 69-74, (2008), incorporated herein by reference.) defines a framework in which switches and routers maintain flow tables and perform per-flow routing. Such flow tables may be dynamically modified from a remote station. Hedera (See. e.g., M. Al-fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vandat, “Hedera: Dynamic Flow Scheduling for Data Center Networks,” Proc. of Networked Systems Design and Implementation (NSDI) Symposium, (2010), incorporated herein by reference.) shows how to use OpenFlow in data center networks to achieve load balancing. However, OpenFlow is not supported by existing commodity switches and routers, and the flow table configuration and maintenance are non-trivial.
In view of the foregoing, it would be useful to provide a scheme that enables one or more of (i) per-flow reroute without requiring any modifications to IP switches and/or routers (ii) a flow-based routing without requiring flow tables in the routers and/or switches, and (iii) easy deployment in existing data center networks to achieve effective load balancing.