A router is a device with input ports and output ports, and capable of deciding to which output port it should forward a packet received via one of its input ports so as to move the packet closer to an end destination (usually specified within the packet itself). A device of this type may be equipped with a plurality of internal switching stages, where packets pass through a series of one or more intermediate (or “next hop”) ports before emerging at a “final hop” port corresponding to one of the output ports of the router.
One of the advantages of packet forwarding systems is that data of varying priorities (or, more generally, service “classes”) can be transmitted simultaneously using the same physical link. Thus, a stream of packets arriving at an input port of a router may contain packets corresponding to different service classes. In the following, packets that belong to the same service class and which are forwarded to the same final hop port will be said to belong to the same “flow”.
It is to be noted that the next hop port for packets belonging to the same flow might be different from one packet to the next, depending on such factors as packet attributes, load balancing issues, and so on. Therefore, it is possible that a sequence of packets belonging to the same flow will follow different paths through the router. Since each path may have its own delay and loss characteristics, packets belonging to the same flow may need to be reordered upon exiting the router in order to reconstitute the order in which the packets originally arrived at the router.
It should be apparent that the number of possible internal paths through the router for a single flow increases with the number of switching stages and also with the number of input-to-output combinations per switching stage. As routers become designed to take on numerous switching stages and/or numerous ports per stage, the number of possible paths for all possible flows through a router can be on the order of millions or more. Simply ignoring this in managing congestion only by final hop port is impractical in scalable systems because avoiding internal flow convergence would require an N-fold switch fabric speedup to support N ports, which is impractical as the port count scales beyond a few ports. Faced with this immense and heretofore unimagined complexity, conventional routing algorithms are ill-equipped to deal with congestion, as is now explained.
Under some conditions, an output port of the router may become congested with respect to packets of a certain flow. This is typically the case for lower priority packets in a given flow but may generally affect packets belonging to any service class. In any event, it becomes impossible to send packets of a certain service class out of a given out output port of the router. Since a flow may consist of many different paths through the router, congestion affecting a flow at the output of the router will cause congestion along each of these individual paths. The severity of the congestion resulting at an individual next hop port that supports the affected flow will depend on such factors as the stage of switching at which the next hop port is located, the number of packets taking that path, the number of congested paths converging at that next hop port, etc. Because of variations in the severity of the congestion across different next hop ports, some of the next hop ports at an intermediate routing stage will no longer be capable of accepting packets belonging to the affected flow, while other next hop ports may still have the capacity to accept packets belonging to that flow. This also applies to situations where an intermediate hop port is congested that others for a flow due to degraded or non-functional switch fabric links, etc.
However, conventional routers do not have the capability to apply different scheduling paradigms to different packets belonging to the same flow. Therefore, in a situation such as the one just described, where different next hop ports at a same stage of switching have different capacities to accept packets belonging to an affected flow, a conventional router will either block/drop all packets belonging to the affected flow or will block/drop all packets going through each next hop port that supports the affected flow. The former option results in a reduction in the pipelining efficiency of a multi-stage router with a corresponding reduction in the ability of the router to operate at a high throughput when the congestion is short-lived and/or recurring, while the latter option results in reduced throughput and increased delay for all previously unaffected flows passing through the (now blocked) next hop ports.