The term Data Centers (DC) generally refers to facilities used to house large computer systems (often contained on racks that house the equipment) and their associated components, all connected by an enormous amount of structured cabling. Cloud Data Centers (CDC) is a term used to refer to large, generally off-premise facilities that similarly store an entity's data.
Network switches are computer networking apparatus that link network devices for communication/processing purposes. In other words, a switch is a telecommunication device that is capable of receiving a message from any device connected to it, and transmitting the message to a specific device for which the message was to be relayed. A network switch is also commonly referred to as a multi-port network bridge that processes and routes data. Here, by port, we are referring to an interface (outlet for a cable or plug) between the switch and the computer/server/CPU to which it is attached.
Today, DCs and CDCs generally implement data center networking using a set of layer two switches. Layer two switches process and route data at layer 2, the data link layer, which is the protocol layer that transfers data between nodes (e.g. servers) on the same local area network or adjacent nodes in a wide area network. A key problem to solve, however, is how to build a large capacity computer network that is able to carry a very large aggregate bandwidth (hundreds of TB) containing a very large number of ports (thousands), that requires minimal structure and space (i.e. minimizing the need for a large room to house numerous cabinets with racks of cards), that is easily scalable, and that may assist in minimizing power consumption.
The traditional network topology implementation is based on totally independent switches organized in a hierarchical tree structure as shown in FIG. 1. Core switch 2 is a very high speed, low count port with a very large switching capacity. The second layer is implemented using Aggregation switch 4, a medium capacity switch with a larger number of ports, while the third layer is implemented using lower speed, large port count (forty/forty-eight), low capacity Edge switches 6. Typically the Edge switches are layer two, whereas the Aggregation ports are layer two and/or three, and the Core switch is typically layer three. This implementation provides any server 8 to server connectivity in a maximum of six hop links in the example provided (three hops up to the core switch 2 and three down to the destination server 8). Such a hierarchical structure is also usually duplicated for redundancy-reliability purposes. For example, with reference to FIG. 1, without duplication if the right-most Edge switch 6 fails, then there is no connectivity to the right-most servers 8. In the least, core switch 2 is duplicated since the failure of the core switch 2 would generate a total data center connectivity failure. For reasons that are apparent, this method has significant limitations in addressing the challenges of the future DC or CDC. For instance, because each switch is completely self-contained, this adds complexity, significant floor-space utilization, complex cabling and manual switches configuration/provisioning that is prone to human error, and increased energy costs.
Many attempts have been made, however, to improve switching scalability, reliability, capacity and latency in data centers. For instance, efforts have been made to implement more complex switching solutions by using a unified control plane (e.g. the QFabric System switch from Juniper Networks; see, for instance, http://www.juniper.net/us/en/products-services/switching/qfabric-system/), but such a system still uses and maintains the traditional hierarchical architecture. In addition, given the exponential increase in the number of system users and data to be stored, accessed, and processed, processing power has become the most important factor when determining the performance requirements of a computer network system. While server performance has continually improved, one server is not powerful enough to meet the needs. This is why the use of parallel processing has become of paramount importance. As a result, what was predominantly north-south traffic flows, has now primarily become east-west traffic flows, in many cases up to 80%. Despite this change in traffic flows, the network architectures haven't evolved to be optimal for this model. It is therefore still the topology of the communication network (which interconnects the computing nodes (servers)) that determines the speed of interactions between CPUs during parallel processing communication.
The need for increased east-west traffic communications led to the creation of newer, flatter network architectures, e.g. toroidal/torus networks. A torus interconnect system is a network topology for connecting network nodes (servers) in a mesh-like manner in parallel computer systems. A torus topology can have nodes arranged in 2, 3, or more (N) dimensions that can be visualized as an array wherein processors/servers are connected to their nearest neighbor processors/servers, and wherein processors/servers on opposite edges of the array are connected. In this way, each node has 2N connections in a N-dimensional torus configuration (FIG. 2 provides an example of a 3-D torus interconnect). Because each node in a torus topology is connected to adjacent ones via short cabling, there is low network latency during parallel processing. Indeed, a torus topology provides access to any node (server) with a minimum number of hops. For example, a four dimension torus implementing a 3×3×3×4 structure (108 nodes) requires on average 2.5 hops in order to provide any to any connectivity. Unfortunately, large torus network implementations have not been practical for commercial deployment in DCs or CDCs because large implementations can take years to build, cabling can be complex (2N connections for each node), and they can be costly and cumbersome to modify if expansion is necessary. However, where the need for processing power has outweighed the commercial drawbacks, the implementation of torus topologies in supercomputers has been very successful. In this respect, IBM's Blue Gene supercomputer provides an example of a 3-D torus interconnect network wherein 64 cabinets house 65,536 nodes (131,072 CPUs) to provide petaFLOPs processing power (see FIG. 3 for an illustration), while Fujitsu's PRIMEHPC FX10 supercomputer system is an example of a 6-D torus interconnect housed in 1,024 racks comprising 98,304 nodes). While the above examples dealt with a torus topology, they are equally applicable to other flat network topologies.
The present invention seeks to overcome the deficiencies in such prior art network topologies by providing a system and architecture that is beneficial and practical for commercial deployment in DCs and CDCs.