With the help of parallel computing frameworks, organizations routinely process petabytes of data on computational clusters containing thousands of nodes. For these massively parallel workloads, the principal bottleneck is often not the performance of individual nodes, but rather the rate at which nodes can exchange data over the network. Data center network (DCN) applications typically demonstrate little communication locality, meaning that the communication substrate must support high aggregate bisection bandwidth for worst-case communication patterns. Unfortunately, modern DCN architectures can be difficult to scale beyond a certain amount of bisection bandwidth and can become prohibitively expensive well in advance of reaching their maximum capacity.
Expensive packet switches can be replaced with many smaller, commodity switches, organized into a fat-tree topology. However, as the number of packet switches grows, so does the cabling complexity and the difficulty of actually constructing the network, especially on a tight schedule and with a minimum amount of human error. Fat-trees have been used successfully in telecom networks, HPC networks, and on chips. One major impediment to adoption of this architecture in data center Ethernet networks is the cabling complexity that can result from thousands of interconnected individual switches and the overhead of managing a large number of individual switch elements. As such, the construction of fat-tree networks from discrete packet switches is not scalable. Other solutions are needed.