The present invention relates to data transmissions between agents in a network and computer interconnect fabric.
Transmissions between agents in a typical network or computer inter-connect fabric are done using “packets” which generally comprise two or more flits or micro-packets that are usually rather small, e.g., 128 bits, to ensure a short transmission time and enable easy handling by very large scale integrated (VLSI) chips along the path. In addition to the data, they contain a small control portion which contains information about the destination locations of the flit and perhaps other information. Dropped flits indicate a failure mode that is not detected by standard cyclic redundancy checking (CRC) or error correction codes (ECC) methods. Parenthetically, such dropped flits can be caused by soft errors in VLSI chips that route the flit to the wrong destination or cause it to be ignored by one of the routers. In this context, soft errors refer to stored information that is lost due to high energy particles resulting from radioactive decay (alpha particles) or gamma rays.
Prior art methods of ensuring the reliability of packet transmissions fall into two categories, i.e., flit-level error detection and correction and end to end transmission assurance. Cyclic redundancy check or error correcting codes can check the contents of a flit for errors in transmission, and depending on the code used and the nature of the error, can make corrections. This approach works well to handle error events that operate on the bit level such as electrical noise coupling on the wires used to transmit the data, or random bit flipping in the data portion of the flit.
The end to end transmission assurance involves an acknowledgement sequence between the ultimate recipient of a packet and the sending agent. With this method, the receiver of a packet immediately sends an acknowledgement packet to the sender when the complete packet is received. The sending agent must hold a complete copy of each packet sent until the acknowledgement packet is received. This approach works well in handling a large class of errors that can corrupt a packet during its transmission. The cost, however, is high since the sending agent must store all packets that are in flight and must use some sort of time out mechanism to determine if the receiver has not gotten the packet, at which time the sender is required to resend the packet. In addition there is the overhead of the acknowledgement packets consuming extra bandwidth in the network.
A need exists to easily detect dropped flits.