High-performance computing (HPC) has seen a substantial increase in usage and interests in recent years. Historically, HPC was generally associated with so-called “Super computers.” Supercomputers were introduced in the 1960s, made initially and, for decades, primarily by Seymour Cray at Control Data Corporation (CDC), Cray Research and subsequent companies bearing Cray's name or monogram. While the supercomputers of the 1970s used only a few processors, in the 1990s machines with thousands of processors began to appear, and more recently massively parallel supercomputers with hundreds of thousands of “off-the-shelf” processors have been implemented.
There are many types of HPC architectures, both implemented and research-oriented, along with various levels of scale and performance. However, a common thread is the interconnection of a large number of compute units, such as processors and/or processor cores, to cooperatively perform tasks in a parallel manner. Under recent System on a Chip (SoC) designs and proposals, dozens of processor cores or the like are implemented on a single SoC, using a 2-dimensional (2D) array, torus, ring, or other configuration. Additionally, researchers have proposed 3D SoCs under which 100's or even 1000's of processor cores are interconnected in a 3D array. Separate multicore processors and SoCs may also be closely-spaced on server boards, which, in turn, are interconnected in communication via a backplane or the like. Another common approach is to interconnect compute units in racks of servers (e.g., blade servers and modules). IBM's Sequoia, alleged to have once been the world's fastest supercomputer, comprises 96 racks of server blades/modules totaling 1,572,864 cores, and consumes a whopping 7.9 Megawatts when operating under peak performance.
One of the performance bottlenecks for HPCs is the latencies resulting from transferring data over the interconnects between compute nodes. Typically, the interconnects are structured in an interconnect hierarchy, with the highest speed and shortest interconnects within the processors/SoCs at the top of the hierarchy, while the latencies increase as you progress down the hierarchy levels. For example, after the processor/SoC level, the interconnect hierarchy may include an inter-processor interconnect level, an inter-board interconnect level, and one or more additional levels connecting individual servers or aggregations of individual servers with servers/aggregations in other racks.
Recently, interconnect links having speeds of 100 Gigabits per second (100 Gb/s) have been introduced, such as specified in the IEEE 802.3bj Draft Standard, which defines Physical Layer (PHY) specifications and management parameters for 100 Gb/s operation over backplanes and copper cables. Mesh-like interconnect structures including links having similar (to 100 Gb/s) speeds are being developed and designed for HPC environments. The availability of such high-speed links and interconnects shifts the performance limitation from the fabric to the software generation of packets and the handling of packet data to be transferred to and from the interconnect.