Ever since the introduction of the microprocessor, computer systems have been getting faster and faster. In approximate accordance with Moore's law (based on Intel® Corporation co-founder Gordon Moore's 1965 publication predicting the number of transistors on integrated circuits to double every two years), the speed increase has shot upward at a fairly even rate for nearly four decades. At the same time, the size of both memory and non-volatile storage has also steadily increased, such that many of today's servers are more powerful than supercomputers from just 10-15 years ago. In addition, the speed of network communications has likewise seen astronomical increases.
Increases in processor speeds, memory, storage, and network bandwidth technologies have lead to the build-out and deployment of networks and on-line resources with substantial processing and storage capacities. More recently, the introduction of cloud-based services, such as those provided by Amazon (e.g., Amazon Elastic Compute Cloud (EC2) and Simple Storage Service (S3)) and Microsoft (e.g., Azure and Office 365) has resulted in additional network build-out for public network infrastructure in addition to the deployment of massive data centers to support these services through use of private network infrastructure.
A common data center deployment includes a large number of server racks, each housing multiple rack-mounted servers or blade server chassis. Communications between the rack-mounted servers is typically facilitated using the Ethernet (IEEE 802.3) protocol over wire cable connections. In addition to the option of using wire cables, blade servers may be configured to support communication between blades in a blade server rack or chassis over an electrical backplane or mid-plane interconnect. In addition to these server configurations, recent architectures include use of arrays of processors to support massively parallel computations, as well as aggregation of many small “micro-servers” to create compute clusters within a single chassis or rack.
Various approaches have been used to support connectivity between computing resources in high-density server/cluster environments. For example, under a common approach, each server includes a network port that is connected to an external central switch using a wire cable Ethernet link. This solution requires a lot of external connections and requires a network interface controller (NIC) for each micro-server CPU (central processing unit, also referred to herein as a processor). This also increases the latency of traffic within the local CPUs compared with others approaches. As use herein, a NIC comprises a component configured to support communications over a computer network, and includes a Physical (PHY) interface and support for facilitating Media Access Control (MAC) layer functionality.
One approach as applied to blade servers is shown in FIG. 1a. Each of a plurality of server blades 100 is coupled to a backplane 102 via mating board connectors 104 and backplane connectors 106. Similarly, each of Ethernet switch blades 108 and 110 is coupled to backplane 102 via mating connectors 112. In this example, each server blade includes a pair of CPUs 114a and 114b coupled to respective memories 116a and 116b. Each CPU also has its own PCIe Root Complex (RC) and NIC, as depicted by PCIe RCs 118a and 118b and NICs 120a and 120b. Meanwhile, each Ethernet switch blade includes an Ethernet switch logic block 122 comprising logic and circuitry for supporting an Ethernet switch function that is coupled to a plurality of Ethernet ports 124 and connector pins on connector 112.
During operation, Ethernet signals are transmitted from NICs 120a and 120b of the plurality of server blades 100 via wiring in backplane 102 to Ethernet switch blades 108 and 110, which perform both an Ethernet switching function for communication between CPUs within the blade server and facilitate Ethernet links to external networks and/or other blade servers. NICs 120a and 120b are further configured to receive switched Ethernet traffic from Ethernet switch blades 108 and 110.
FIG. 1b shows an augmentation to the approach of FIG. 1a under which PCIe signals are sent over wiring in backplane 102 rather than Ethernet signals. Under this configuration, each of a plurality of server blades 130 includes one or more CPUs 132 coupled to memory 134. The CPU(s) 132 are coupled to a PCIe Root Complex 136, which includes one or more Root Ports (not shown) coupled to connector pins in a connector 138. Meanwhile, each of Ethernet switch blades 140 and 142 includes an Ethernet switch logic block 144 coupled to a plurality of Ethernet ports 146 and a PCIe switch logic block 148 coupled to connector pins on a connector 150.
Another approach incorporates a fabric with the local micro-server CPUs by providing dedicated connections between the local micro-server CPUs and uplinks from each micro-server CPU to a central switch. This solves the latency problem, but requires inter micro-server CPU connectivity and a large number of uplinks. This approach may be augmented by providing dedicated connections between the CPUs and providing uplinks only from some servers, while other servers access the network through the fabric. This solves the connectivity problem but increases latency. Both solutions using a fabric also require a dedicated protocol or packet encapsulation to control the traffic within the fabric.
To address some communication aspects of virtualization on server blades, PCI-SIG® (Peripheral Component Interconnect—Special Interest Group) created the Multi-Root I/O Virtualization (MR-IOV) specification, which defines extensions to the PCI Express (PCIe) specification suite to enable multiple non-coherent Root Complexes (RCs) to share PCI hardware resources across blades. Under the MR-IOV approach, a NIC is configured to share its network interface among different virtual machines (VMs) running on host processors, requiring use of one or more additional MR-IOV switches capable of connecting to different data planes.
Yet another approach is to employ distributed switching. Under distributed switching, micro-server CPU's are connected to each other with interconnect links (such as via a ring, torus, 3-D torus etc., topology), with a few uplinks within the topology for reaching an external network. Distributed switching solves some connectivity issues common to star topologies, but adds significant latency to the data transfer. Additionally, data transmissions often require blocks of data to be sent along a path with many hops (i.e., through adjacent micro-server CPUs using a ring or torus topology), resulting in substantial waste of power.