A small data center may include a modest number of server racks, but a large data center may include 100,000 servers or more. There are typically 20-100 servers in a server rack, meaning that 1,000-5,000 server racks are present in such a large data center. These server racks each typically include an electrical switch interconnecting the servers in the server rack, and interconnecting the servers with other servers in the data center. These server rack-level switches are referred to as “top of rack” (TOR) switches. Depending on the number of servers in the data center, the TORs are interconnected through a single layer of electrical core switches, or may require a multi-layer electrical switching architecture. A typical configuration for large data centers is to group the server racks in rows and introduce one or two electrical “end of row” (EOR) switches to provide interconnection of the TOR switches in each row, and to interconnect the rows in the data center. A locally-interconnected group of server racks in a data center is also referred to as a “cluster,” a “pod,” or the like. Beyond the rows or clusters, there may also be a layer of electrical core switching, or even a multi-layer electrical core switching hierarchy. As server bandwidth grows from 10 Gbps to 40 Gbps and beyond, it becomes increasingly difficult to provide adequate electrical switching capability to support all possible server-to-server communication patterns at full bandwidth across a large data center. Therefore, what is still needed in the art are improved systems and methods for the large scale interconnection of electrical switches such as the TORs and the like. Note, other architectures besides TORs/EORs are also used such as middle of rows (MORs)/EORs without TORs, hierarchies without EORs, etc. Each of these architectures also needs improved systems and methods for the large scale interconnection of electrical switches.
A similar interconnect scaling problem exists in multi-chassis routers and electrical switches. That is, data center applications are an exemplary embodiment and generally this interconnect scaling problem can exist with any generalized multi-chassis routers and electrical switches. As network capacity increases, there is a need for increased scaling of non-blocking electrical switches. Switching chassis scale is set by the density of optical interfaces. However, projected network junction bandwidth is anticipated to require multi-chassis switching solutions to reach the required scale, and, in such systems, the multi-chassis interconnects present significant challenges in terms of size, cost, and power dissipation.
Referring to FIG. 1, a typical problem observed in a large data center is illustrated in the following example of a “fat tree” architecture. Servers 5 are interconnected to an electrical TOR switch 10 via D server links 11. These server rack-level links 11 are typically electrical. There is a minimum of one link 11 per server 5, but there could be more than one. These links 11 are typically 1 Gigabit Ethernet (GE) today, but, in the near future, they will be 10 GE and 40 GE. The TOR switches 10 are connected to the core switches 12 (currently electrical) or routers, which provide overall data center interconnect. These links 13 are typically optical (usually at 10 Gbps, but increasing over time) and the number of TOR uplinks, U, is the same as the number of core switches 12 in a minimally-connected two-level tree. It should be noted that there may be multiple uplinks from each TOR switch 10 to the core switches 12 in order to better match the total interface bandwidth on the servers 5. For purposes here, the total number of these uplinks is U. The core port count, C, determines the maximum number of TOR switches 10 that may be interconnected. Additional ports on the core switch 12 must be reserved to bring data into and out of the server cluster, but for purposes here these are ignored.
Thus, the total number of server ports that may be interconnected across a two-level tree is C*D (if D is constant across all TOR-core connections, which is a reasonable simplification). Common practice can achieve full bandwidth interconnection only among a group of the order of 1000 servers in a two level fat tree, while the largest data centers have more than 100,000 servers. Additional layers (typically aggregation switches) may be added if increased capacity is required, but at a substantial increase in overall complexity, cost, and power consumption.
Thus, the scale of modern data centers is such that managing TOR-to-core uplinks becomes very difficult. The sheer number of required interconnect cables leads to management and cost problems. The optical cables become a jumbled mess. Clearly, a more scalable approach is desirable to address both the cabling and core switch count and capacity constraints.