High-performance computer (“HPC”) systems typically include many processors, each with its own local memory. At least some pairs of the processors are interconnected via links to enable each processor to access memory (“non-local memory”) of each, or at least some, of the other processors. Some such systems are constructed according to non-uniform memory access (“NUMA”) designs, in which access to non-local memory is slower than access to local memory. Because a HPC system may not include a separate link between every pair of processors, some non-local memory accesses are routed through third (or more) processors, thereby traversing multi-hop routes. However, determining routes quickly for each non-local memory access poses problems. Furthermore, congested links or routes retard non-local memory accesses, thereby negatively impacting performance of the affected processor(s).
A crossbar switch is an assembly of individual switches between a set of inputs and a set of outputs. The switches are arranged in a matrix. If the crossbar switch has M inputs and N outputs, then a crossbar has a matrix with M×N cross-points or places where the connections cross. At each crosspoint is a switch that, when closed, connects one of the inputs to one of the outputs. One exemplary crossbar is a single layer, non-blocking switch in which other concurrent connections do not prevent connecting other inputs to other outputs. Collections of crossbars can be used to implement multiple layer and blocking switches.
A typical crossbar arbitration scheme can make use of various different protocols to flow traffic from competing sources. These algorithms typically include round robin and aging arbitration protocols. Arbiters that are configured to support multiple algorithms are often very complex or make significant comprises in order to meet stringent timing requirements.