1. Field
The present disclosure generally relates to techniques for determining input-output mappings for a switch. More specifically, the present disclosure relates to an arbitration technique that maintains mappings from a preceding arbitration decision cycle.
2. Related Art
On-chip and inter-chip routers accept flits (the logical fragments of a packet) from incoming on-chip network links, examine the destinations of these flits and route them through the appropriate outgoing on-chip network links. A canonical router includes of a set of input ports to accept the incoming flits, a set of output ports to issue the outgoing flits, routing logic to determine the next hop for each flit, a crossbar or switch to transfer flits from the input ports to the output ports and a switch allocator which attempts to create a conflict-free schedule of flits to be transferred on each arbitration decision cycle. Moreover, blocking networks (i.e., networks that do not have an independent path from every source to every destination) typically rely heavily on router throughput for performance, especially at high loads.
Switch arbitration has a first-order impact on router throughput and the overall network performance. Typically, the switch allocator needs to maximize the number of flits transferred across the crossbar on each arbitration decision cycle while maintaining fairness among the input and output ports. However, this arbitration calculation is often non-trivial.
Furthermore, design of switch allocators can be complicated by additional factors. For example, current router designs usually use some form of input queuing (such as virtual channels) to mitigate head-of-line blocking. As a consequence, each input port may have flits from multiple input queues requesting different output ports. However, design and technology constraints often restrict an input port to transferring at most one flit per arbitration decision cycle, and an output port to accepting at most one flit per arbitration decision cycle. Therefore, the switch allocator typically must grant a subset of input port requests that maximizes the number of flits transferred without violating the above constraints and, at the same time, maintain fairness among the input and output ports.
In addition, in order to maximize the router throughput, the switch allocator often must be able to provide a set of matches in each arbitration decision cycle. However, at current clock speeds, the switch allocator usually cannot acquire a global snapshot of the input requests within a clock cycle and therefore must resort to distributed arbitration, in which the input and output ports act independently of each other and are agnostic of the decisions of the other input and output ports. In this approach, an input port is not aware of the requests submitted by the other input ports, nor is an output port aware of the grants issued by the other output ports. This distributed arbitration often causes conflicts in the port allocation, leading to wasted bandwidth on the output links.
For example, consider a scenario in which an input port A can submit requests to output ports X and Y, and input port B can only submit a request to X. If input ports A and B both submit requests for output port X, this results in either input port A or input port B losing an opportunity to transmit when input port A could have transferred a flit to output port Y and input port B could have transferred a flit to output port X in the same arbitration decision cycle. Arbitration collisions such as this typically limit the router throughput and, thus, the overall network performance at high injection loads. Therefore, it can be difficult for existing switch allocators to balance the conflicting requirements of reducing arbitration collisions while maintaining high throughput.
Hence, what is needed are a switch allocator and an arbitration technique that does not suffer from the above-described problems.