A conventional datacenter typically employs a datacenter network that is organized as a fat-tree network. FIG. 1 shows an example of the topology for such a conventional fat-tree datacenter network 100. Topology, as used herein, incorporates a set of hosts, switches and the links that interconnect the hosts and the switches. The interconnections in a topology may be represented using a directed graph. The fat-tree datacenter network 100 includes higher level switches such as aggregation switches 110 and 112. The aggregation switches 110 and 112 are connected to each of spine switches 120. Each spine switch 120 is connected to a plurality of top-of-rack (ToR) switches 130. In the example shown in FIG. 1, each spine switch 120 is connected to eight of the ToR switches 130. The ToR switches 130 are configured to connect with racks of hosts 140. The racks of hosts 140 may be for example, racks of servers that provide services for the datacenter.
Hosts within a given rack 140 may be connected to the ToR switch 130 by a common electrical or fiber optic networking cable. The switching elements in the network, such as routers and/or switches such as the aggregation switches 110, 112, the spine switches 120 and the ToR switches 130, may have multiple ports, commonly referred to as the switch radix or degree. For example, the switching elements may have 12, 24, 32, 36 or 64 ports. The ports may be logically assigned as “up links” and “down links” to designate upward and downward facing ports, respectively. Up links route packets between stage N and stage N+1 (i.e. up the tree) of a multi-stage network, such as the fat-tree datacenter network 100. Down links route packets between stage N+1 and stage N (i.e. down the tree) of the multi-stage network, e.g. the fat-tree datacenter network 100. If all ports have the same bandwidth, then the ratio of up links to down links may be varied to change the network performance and cost profile.
For example, a switch that has k up links and k down links is said to be “fully provisioned” since the bandwidth between successive stages of the interconnection network is matched. A switch with m down links and n up links, where m>n (i.e. more down links than up links) is said to be “under provisioned” or equivalently “over-subscribed” since the upward-facing links do not have sufficient bandwidth to carry all the traffic flowing from downward-facing links. The oversubscription point is typically at the lowest stage of the network, commonly the top of rack switch 130, to reduce the overall network cost since more hosts, e.g. racks 140, are sharing the aggregate network bandwidth. A typical datacenter network may be oversubscribed by 2×, 4×, 8× or more to balance the communication demands from applications and overall network cost.
Since multiple hosts share the fat-tree network of the datacenter, only a few of the hosts are likely to use their available injection bandwidth simultaneously. Injection bandwidth, in this context, is equivalent to the bit rate of the physical channel present at the host. Each host is assumed to be coupled to a ToR switch 130 in the network using a network interface controller (NIC). To take advantage of only a few of the hosts using their available injection bandwidth, oversubscription is often used in datacenter networks to allow the aggregate host injection bandwidth to exceed the capacity of the network.
FIG. 2 visually depicts the phenomena of oversubscription. For a given ToR switch 210, the outgoing bandwidth 220 is significantly less than the host facing bandwidth 230 originating from the host 240. Oversubscription may be expressed as an oversubscription ratio, which is the degree to which the aggregate bandwidth exceeds the network capacity. The network capacity refers to the maximum load that a minimum bisector of the network can sustain for uniformly distributed traffic. A bisector is a set of links that, when removed, bifurcates the network at equal halves. The minimum bisector is the bisector containing the minimum number of edges over the bisectors of the network.