Many network fabrics include various switches, routers, etc. to receive and forward packets to thereby permit originating endpoint nodes to send packets to destination endpoint nodes. A link is a communication pipeline between adjacent nodes in the fabric. The nodes on either end of a link may include endpoint nodes which produce and consume traffic, and intermediate nodes which propagate traffic from one node to another node. Endpoint nodes may comprise, for example, central processing units (CPUs), memory devices, storage devices, peripherals, accelerators, renderers, graphical display devices, etc. Intermediate nodes may comprise, for example, switches, routers, proxies, translators, repeaters, protocol converters, etc. A packet may be transmitted from a transmitting node to a receiving node over a dedicated link between such nodes.
Some networks employ end-to-end (E2E) retry, link level retry (LLR), or both, to ensure that a packet arrives at its intended destination without error. LLR addresses transient CRC-detectable packet corruption due to electrical interference for packets crossing an individual link. The LLR mechanism is implemented across each link independently of the other links. LLR assigns unique sequence numbers to individual packets transmitted across a given link so that if an individual packet experiences an error during transmission over a given link, the receiving node can inform the transmitting node for that link can be made aware of that fact (via a link level negative acknowledgment (NAK) message from the receiving node) and can retry (i.e., resend) the packet again across the link. Each transmitting node stores copies of its outgoing packets in they need to be re-sent. If a receiving node detects an error with a packet received from a transmitting node, the receiving node sends the link level NAK message back across the link to the transmitting node. The link level NAK message includes the sequence number of the packet that had the error. The transmitting node responds to the link level NAK message by resending the identified packet.
E2E retry may be used in combination with LLR, or alone. E2E retry permits an originating endpoint node to resend a packet if an acknowledgment of that packet is not timely received from the destination endpoint node. E2E can protect against component or link failures along a route; by migrating the retry to an alternate route.