The use of a large number of multi-core processors combined with centralization techniques continue to increase in popularity for applications that feature computationally intensive tasks. For example, systems implemented with a large number of compute nodes disposed in proximity to each other, and coupled via high-speed interconnects, are particularly well suited for applications such as quantum mechanics, weather forecasting, climate research, oil and gas exploration, and molecular modeling, just to name a few. These multi-node systems may provide processing capacity many orders of magnitude greater than that of a single computer. This gap grows exponentially each year. For example, some multi-node systems have processing capacity (generally rated by floating point operations per second (FLOP)), in the petaflops range.
This pursuit of increased performance has led to approaches including massively parallel systems featuring a large number of compute nodes, with each node providing one or more processors, memory, and an interface circuit connecting the node to a multi-node network. In so-called “fabric” network configurations, wherein each node has potentially N number of paths to other nodes of the multi-node network, a sequence of contiguous packets may ultimately take different paths between an initiating node and a target node. Thus the target node may receive a predominately out-of-order sequence of packets. Out-of-order reception of packets poses one non-trivial challenge in the design of multi-node systems.
These and other features of the present embodiments will be understood better by reading the following detailed description, taken together with the figures herein described. The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing.