Communication networks tend to be constructed according to various physical and/or logical topologies, which can often depend on the capabilities of the components of the communication network. For example, FIG. 1 shows a communication network 100 in a hierarchical topology previously used in enterprise and data center communication networks.
Network 100 has a lower layer 110 comprised of servers 112, which are typically rack mounted or otherwise concentrated with regard to physical location. A layer 120 uses layer 2 top-of-the rack (TOR) switches 122 to connect servers 112. A layer 130 is composed of layer 2 and/or layer 3 aggregation switches (AS) 132 to interconnect several TOR switches 122. A layer 140 is the top layer of network 100, and is composed of core routers (CR) 142 that connect aggregation switches 132. Often, core routers 142 also function as a gateway to connect to an Internet 150.
One major drawback of the network architecture of network 100 is the design orientation mostly for network traffic from users to the servers, so-called North-South traffic that travels in a generally vertical direction in network 100. Due to the very high oversubscription ratio from layer 120 to layer 140, which is collectively from about 1:80 to about 1:240, the so-called West-East traffic between servers 112 that travels in a generally horizontal direction in network 100 can be subject to performance issues. For example, such high oversubscription ratios can create a bottle neck for traffic between servers 112, since the traffic typically flows through layers 120, 130 and 140, rather than directly between servers 112.
Several network topologies have been proposed to overcome the above-mentioned drawback of network 100, where the architecture aim is to flatten the network topology to promote West-East traffic and reduce the oversubscription ratio to a more reasonable of 1:3 to even 1:1. FIG. 2 shows a communication network 200, which is an example of a so-called fat-tree topology for a data center. The topology of network 200 is a special type of Clos topology that is organized in a tree-like structure. Clos topologies help to reduce physical circuit switching needs with respect to the capacity of the switches used to implement the topology. This type of topology is built of k-port switches, and has k pods of switches. Each pod has two layers of switches, each layer has k/2 switches and each pod connects with (k/2)2 servers. There are (k/2)2 core switches, which connect with k pods. The total number of servers supported is k3/4. Network 200 shows an example of the fat-tree topology with k=4. Accordingly, each switch 202 has four ports, there are four pods 210, 211, 212 and 213, with two layers and two switches in each layer. Each pod 210-213 connects with four servers 220, for a total of sixteen servers supported. There are four core switches 230 that connect with four pods 210-213. Note that although network 200 has twenty switches 202, compared to fourteen for network 100 (FIG. 1), each of switches 202 has four ports. Thus, the topology of network 200 can permit greater West-East traffic through-flow than network 100, and can reduce the oversubscription ratio with switches that have a relatively small number of ports. Also, network 200 avoids the use of expensive core routers 142 (FIG. 1). Network 200 also scales to larger server connections by adding more layers.
Besides fat-tree, other network topologies based on Clos architecture have been proposed, such as the spine and leaf topology of network 300 of FIG. 3. The topology of network 300 can be viewed as a folded Clos topology, and scales to larger server connections by adding more layers. Unlike the architecture of network 100 that has two big core routers 142, in the folded Clos design of network 300, each of layers 330 and 340 uses a relatively large number of switches that are connected to a lower layer.
However, fundamentally, both fat-tree and folded Clos architecture are topologically similar to traditional layered network, in that they are all assembled in a tree like topology. The difference is the fat-tree and folded Clos arrangements use a series of switches in the top layer, while the traditional network uses one or more big routers at a top layer. These architectures are often called “scale-out” architecture rather than “scale-up” (bigger router) architecture.
One drawback of fat-tree and folded Clos architectures is the increased number of switches used. In addition, large numbers of cable connections are made between all the switches being used to implement the architectures. The complexity of the cabling connectivity and the sheer number of cables used to implement these architectures make them less attractive from a practicality viewpoint. Moreover, in practice, these architectures tend to scale poorly once the network has been built, due at least in part to the further increased complexity of cable connections.
FIG. 4 shows a network 400 that is implemented in a meshed ring architecture, where each switch 402 has a direct connection with all of the other switches 402. However, this architecture is limited in terms of scalability, since the size is limited by the total number of switch ports available for interconnection for each switch, similar to the problem addressed with the Clos related topologies discussed above.
FIG. 5 shows a network 500 organized as a three dimension flattened butterfly topology. This topology of network 500 can scale to large numbers of switch nodes 510 that can support a relatively large number of servers in a relatively large data center. Network 500 can be built using the same organization for switch nodes 510 for the entirety of network 510, and offer flat network topology, higher bisection bandwidth, and low hop counts. However, three dimension flattened butterfly architectures tend to have a high port count per switch, which tends to increase costs, and use long global connections, which tend to be relatively expensive and also add to implementation costs.
While the architectures illustrated in FIGS. 4 and 5 are attractive for a data center network from the perspective of performance, the complicated connectivity and cabling make networks 400 and 500 difficult to implement in practice in a data center environment. In addition to the complexity, the costs tend to be driven up by relatively expensive cabling used to implement the topology.
For example, optical cabling is often used to increase speed and throughput in a data center network. Switch ports are directly connected to other switch ports according to the topology configuration, so careful mapping of ports that may be physically separated by relatively large distances is undertaken. In addition, the physical reach of the optical cables is often expected to be greater than 100 meters. If there is a problem with cable or switch component malfunction, correction of the problem can be costly as well as complicated to implement, since switches and/or cables may need to be installed, and correctly connected in accordance with the complex topology being implemented.
As data centers become more like high performance computing (HPC) platforms, many of the network topologies used in HPC have been proposed for data center networks. However, the topologies employed in an HPC application do not translate well to data center network environments, since the HPC computer processors tend to be densely packed, and the networking connections tend to be restricted to a smaller space, thus limiting complexity and cost for those applications.
In addition, networks implemented with architectures such as those illustrated in FIGS. 4 and 5 can be prohibitively costly to implement all at once for some applications. It is often desirable to implement a smaller scale ring mesh or multi-dimensional network topology, to which additional components and switches can later be added. Adding on switches, nodes or other components is often called “scaling out”, and is attractive from a cost perspective, since the entire cost of the full network architecture can be deferred in favor of an initial, smaller network. However, scaling out an existing network topology presents a number of challenges related to complexity of interconnections and the number of cables and ports that are reconfigured to permit the additional components to be added to the network topology. Often, the increased complexity of cable connections alone make scaling out efforts complicated and expensive to implement.
In addition to the challenges of scaling out an existing network topology, there is often a cost issue associated with purchasing equipment that is intended for a larger network, but used to implement a smaller network, with the expectation of scaling out the network at a later time. In such a case where a larger network topology is planned, but a smaller network topology is actually implemented in the near term, the purchased components can be designed for a much larger network than is actually implemented. The cost of such components tends to be significantly greater than comparable components used with a smaller network topology owing largely to the greater expected capacity to be handled with the larger scale. Such initial stages of large scale implementations often lead to somewhat isolated network capacity that goes unused for a significant period of time, which can have a significant negative impact on cost budgets for implementing a desired network topology. This type of purposely implemented unused capacity is sometimes referred to as “stranded bandwidth”, since the equipment is capable of supporting greater bandwidth than is actually used, and the cost associated with the unused bandwidth is invested in such equipment with deferred implementation, thereby increasing the effective cost of the network implementation.