Massively parallel computers use many microprocessors operating in parallel. The microprocessors are connected together by a mesh of routing chips. Routing chips route data in the form of flow control units, called flits, which typically include 16 data bits and two parity bits. Each routing chip transmits and receives flits from adjacent routing chips, as well as transmitting and receiving flits from the microprocessor with which the routing chip is associated.
The mesh operates without a master clock, using a self-timing, two cycle signaling protocol. A routing chip signals transmission of a flit to one of its neighbors by toggling a request signal, REQ. After the neighboring routing chip loads the flit, it acknowledges receipt of the flit by toggling an acknowledge signal, ACK.
Interlock signaling refers to a two cycle signaling protocol that requires acknowledgment of a transmit request prior to transmission of another flit. Interlock signaling works well when distances between the source and destination of a flit are small. As the distances between the source and destination increase, due to increased mesh size, the rate of flit transmission decreases.
Data streaming can increase the flit transmission rate and prevent interlock by permitting a number of consecutive flits to be transmitted without acknowledgment provided that the streaming depth is matched to the distance between the source and destination. Increased distance between source and destination routing chips requires increased streaming depth to avoid interlock. If the streaming depth of routing chips is not matched to the distance between source and destination then data streaming will defer but not prevent interlock and decreased flit transmission rate. Routing chips using data streaming begin operating in interlock when the maximum number of flits that can be consecutively transmitted without acknowledgment have been transmitted.
The Paragon.TM. Supercomputer manufactured by Intel Corporation of Santa Clara, Calif., uses a prior mesh routing chip that allows up to four flits to be transmitted prior to requiring receipt of an acknowledgment. FIG. 1 illustrates a portion of that prior mesh routing chip (MRC). The MRC includes routing control circuitry, a receiver and a transmitter. The transmitter uses a self-timed fall-through, First-In, First-Out register (FIFO) to keep track of whether the number of unacknowledged transmit requests exceeds a maximum number. Similarly, the receiver includes another self-timed fall-through FIFO, which stores flits received by the routing chip. Acknowledgment of a flit occurs after a flit reaches the last stage of the second FIFO.
Both FIFOs include four serially coupled, self-timing tracking units, which use a four cycle signaling protocol. Additionally, the receiver FIFO includes a latch coupled to each tracking unit to store flits as they are received. Each tracking unit indicates that it contains valid data to its downstream neighbor via an active full signal, FULL. The downstream tracking unit indicates that it has begun loading the data by bringing a load signal active and indicates completion of the loading by bringing the load signal inactive. In response, the transmitting upstream tracking unit brings its full signal inactive. The relationship between the full signal and the load signal is illustrated in FIG. 2.
Data streaming increases the latency of the mesh routing chip by increasing the time required to process received flits, acknowledge received flits and process acknowledgments of transmitted flits.