High-performance computing (HPC) has seen a substantial increase in usage and interests in recent years. Historically, HPC was generally associated with so-called “Super computers.” Supercomputers were introduced in the 1960s, made initially and, for decades, primarily by Seymour Cray at Control Data Corporation (CDC), Cray Research and subsequent companies bearing Cray's name or monogram. While the supercomputers of the 1970s used only a few processors, in the 1990s machines with thousands of processors began to appear, and more recently massively parallel supercomputers with hundreds of thousands of “off-the-shelf” processors have been implemented.
There are many types of HPC architectures, both implemented and research-oriented, along with various levels of scale and performance. However, a common thread is the interconnection of a large number of compute units, such as processors and/or processor cores, to cooperatively perform tasks in a parallel manner. Under recent System on a Chip (SoC) designs and proposals, dozens of processor cores or the like are implemented on a single SoC, using a 2-dimensional (2D) array, torus, ring, or other configuration. Additionally, researchers have proposed 3D SoCs under which 100's or even 1000's of processor cores are interconnected in a 3D array. Separate multicore processors and SoCs may also be closely-spaced on server boards, which, in turn, are interconnected in communication via a backplane or the like. Another common approach is to interconnect compute units in racks of servers (e.g., blade servers and modules) that are typically configured in a 2D array. IBM's Sequoia, one of the world's fastest supercomputer, comprises a 2D array of 96 racks of server blades/modules totaling 1,572,864 cores, and consumes a whopping 7.9 Megawatts when operating under peak performance.
One of the performance bottlenecks for HPCs is the latencies resulting from transferring data over the interconnects between compute nodes. Typically, the interconnects are structured in an interconnect hierarchy, with the highest speed and shortest interconnects within the processors/SoCs at the top of the hierarchy, while the latencies increase as you progress down the hierarchy levels. For example, after the processor/SoC level, the interconnect hierarchy may include an inter-processor interconnect level, an inter-board interconnect level, and one or more additional levels connecting individual servers or aggregations of individual servers with servers/aggregations in other racks.
It is common for one or more levels of the interconnect hierarchy to employ different protocols. For example, the interconnects within an SoC are typically proprietary, while lower levels in the hierarchy may employ proprietary or standardized interconnects. The different interconnect levels also will typically implement different Physical (PHY) layers. As a result, it is necessary to employ some type of interconnect bridging between interconnect levels. In addition, bridging may be necessary within a given interconnect level when heterogeneous compute environments are implemented.
At lower levels of the interconnect hierarchy, standardized interconnects such as Ethernet (defined in various IEEE 802.3 standards), and InfiniBand are used. At the PHY layer, each of these standards support wired connections, such as wire cables and over backplanes, as well as optical links. Ethernet is implemented at the Link Layer (Layer 2) in the OSI 7-layer model, and is fundamentally considered a link layer protocol. The InfiniBand standards define various OSI layer aspects for InfiniBand covering OSI layers 1-4.
Modern high performance fabrics need to support a variety of advanced protocols and topologies which require deadlock avoidance techniques. A common approach to meeting these requirements is to separate traffic onto multiple virtual lanes (aka virtual channels) such that credit based flow control for each virtual lane can be independent from other virtual lanes. In addition to these basic requirements, increasing use of fabrics as multi-protocol and multi-application networks is requiring increased flexibility, configurability and control for traffic separation and Quality of Service (QoS).
In order to meet these increasing needs, fabrics are required to support larger numbers of virtual lanes at their core, however devices at the edge of fabrics or specific subsystems in a fabric may not need as many virtual lanes or may be implemented using hardware designed with support for fewer virtual lanes, this implies that the configuration and utilization of virtual lanes must permit heterogeneous fabrics with regard to hardware capabilities in this area.