The present disclosure relates generally to the field of computer pipeline control and, more particularly, to circuit configurations for asynchronously transferring data through pipelines.
The pipeline processor is a common paradigm for very high speed computing machinery. Pipeline processors, or pipelines, operate on data as it passes along them. The most basic implementation of a pipeline is as a first-in-first-out (FIFO) memory, in which unaltered data are copied from one stage to the next in a pipelined fashion.
In a pipelined FIFO memory, when a full, or occupied, stage precedes an empty, or unoccupied, stage, logic is required to perform these key functions: (1) transfer the data from the full stage to the empty stage; (2) set the empty control state to full; and (3) set the full control state to empty. Ideally, data elements should move from the full to empty stage as soon as possible. Pipeline speed is maximized when all three actions are performed together. Data flow through a pipeline such as a FIFO memory may be either clocked (synchronous) or non-clocked (asynchronous). In a clocked FIFO memory data elements typically march forward through successive stages in lock step, with each stage taking a fixed number of clock cycles (typically one or two). The clocked FIFO receives elements on a fixed schedule defined by the clock, and delivers them at the same fixed rate.
Asynchronous FIFO memories enjoy significant design advantages over clocked FIFO memories because each stage operates at its own pace, requiring only local communication between adjacent stages. By avoiding overhead for global clock distribution, asynchronous FIFO memories may operate at higher speeds and consume less power than a clocked FIFO memory.
FIG. 1 is a block diagram of the general structure of a conventional asynchronous FIFO memory 50. FIFO memory 50 includes an input section 100 and an output section 102. Coupled between input section 100 and output section 102 are data registers 104, 105, 106, and 107 that store and move the data elements through the FIFO memory. Each control circuit 110 through 113 is associated with a corresponding data register, 104 through 107, respectively.
When input section 100 has a data element ready to be placed into data register 104, section 100 checks the status of control circuit 110. If control circuit 110 indicates register 104 is empty, input section 100 transfers the data element to register 104 and control circuit 110 changes state to indicate that register 104 is full. If control circuit 110 indicates full and control circuit 111 indicates empty, the data element is then transferred to the next register 105. Simultaneously, or nearly simultaneously, control circuits 110 and 111 change state so that control circuit 110 now indicates empty and control circuit 111 now indicates full. In this manner, the data element will propagate through FIFO memory 50 until it finds a full stage directly ahead of it. When this happens, the data element stops and waits for the succeeding stage to become empty.
Because the data elements in FIFO memory 50 autonomously move through empty stages until stopped by a full stage, the speed of data movement depends on the time required by the individual registers and the control elements to sense the full/empty conditions and then move the data. Moving a data element involves: (1) latching the data element in the next stage in the FIFO memory, and (2) unlatching the data element in the previous stage. Therefore, increasing the speed of data movement requires binding the latching and unlatching operations as tightly as possible. Speed suffers if unlatching occurs too late after latching, system robustness suffers if unlatching occurs before latching completes.
FIG. 2 is a circuit diagram of a conventional circuit 200 implementing part of the FIFO memory 50 in FIG. 1. Registers 202 and 204 each correspond to one of registers 104 through 107, and control section 206 loosely corresponds to the control elements 110 through 113. Control section 206 includes serially-connected Muller C elements 208, 210, and 212, and exclusive-OR (XOR) gates 214 and 216.
A Muller C element is a well-known sequential circuit element that reproduces the value of its input nodes at its output node when the input nodes become logically identical. For example, if a C-element had inputs (low, low), its output would be low. If the input nodes changed to (high, low), its output would remain low. The output would change to high only when the input nodes changed to (high, high). C-elements 208, 210, 212 each have an inverted upper input (represented by a bubble). Thus, C-elements 208, 210, and 212 will replicate their lower input on their output node when the two inputs differ.
C-elements 208, 210, and 212 are connected so that each C-element takes as one input the output of the previous (lower) C-element in the chain, and as the other input, the inverted output of the next (upper) C-element in the chain. Thus, a given C-element changes its output only when its state is different from the state of the previous C-element and the same as that of the next C-element in the chain.
XOR gate 216 receives the output of Muller C elements 208 and 210 and drives the gate input of register 204 with its output. Similarly, XOR gate 214 receives the output of Muller C element 210 and 212 and drives the gate input of register 202 with its output. Although FIG. 2 shows only two registers, any number of FIFO stages could be implemented by correspondingly extending the control section with additional C-elements and XOR gates, and adding the required number of registers to the data circuitry.
Table I illustrates exemplary timing relationships of the logic states of the circuit shown in FIG. 2 at the circuit points labeled A, B, C, D, E, F, and G.
TABLE I ______________________________________ Time Interval A B C D E F G ______________________________________ 0 low low low low low high low 1 high low low low low high low 2 high high low high low high low 3 high high high low high high low 4 high high high low low high high ______________________________________
At time interval 0, point F is high, indicating that the data register below is fill and the value that it contains is present at point H. Points E and D are low, indicating that registers 202 and 204 are empty. In interval one, C-element 208 changes state so that the value at point A matches that at point F. Next, the output of C-element 208 propagates to C-element 210 and the output D of XOR gate 216 becomes high, indicating that register 204 should latch its input data (interval two). Also in interval two, output B of C-element 210 becomes high. Since output B serves as input to C-elements 208 and 212, and to XOR gates 214 and 216, this change in output B has multiple consequences that appear in interval three.
At time interval three, C-element 212 outputs a high voltage at point C in response to the high value at point B, since its inverting input from point G is low. Also in response to the high voltage at point B, output D of XOR gate 216 changes to low, indicating that register 204 is now empty. In further response to the high voltage at point B, the output E of XOR gate 214 changes to high, indicating that its register 202 is now full. Finally, the high value of point B is applied to the inverting input of C-element 208, so that it can respond to a subsequent change in input F from high to low that would announce the availability of a second data value at the data input to data register 204.
In interval four, output E of XOR gate 214 returns to its original low value as a consequence of the high value at point C. The low value of point E indicates that its data register 202 is empty, having passed the first data value on to the next stage of the pipeline, which is not shown. The initially low value at point G indicated that the next stage of the pipeline was ready to accept the value of register 202 when point C changed to the high value. Thus, in interval four, points D and E have returned to their initial low values, indicating empty. Any time after point A changed from low to high in interval 1, the circuit could receive a second data value that would be signaled by a change in point F from high to low. The second and subsequent data values can propagate correctly, in an orderly fashion, though an unlimited number of pipeline stages of the kind described above.
Correct and efficient movement of the data elements from register 204 to register 202 in FIG. 2 depends on accurate timing of the changes in the register gate signals at points D and E, so that D drives register 204 empty and E drives register 202 full as soon as possible without corrupting the data element. Unfortunately, the relative timing of the state changes at points E and D when point B changes state is subject to the difference of delays in XOR circuits 214 and 216. XOR gates are relatively complicated circuits that can introduce significant timing uncertainties into the FIFO memory shown in FIG. 2, and those uncertainties reduce the maximum safe operating speed of the circuit.
There is, therefore, a need to reduce timing uncertainties in an asynchronous pipeline, such as a FIFO memory, and to couple control signals more tightly to increase operating speeds of such pipelines.
The need is even more critical when the timing of data movement is subject to additional constraints, such as in a counterflow pipeline. FIG. 3 is a block diagram illustrating a portion of a counterflow pipeline. Counterflow pipeline 300 includes a first pipeline 320 (called the "instruction pipeline") and a second pipeline 340 (called the "result pipeline"), which move data elements in opposite directions. In FIG. 3, instruction pipeline 320 moves data elements to the right and result pipeline 340 moves data elements to the left. The stages of pipeline 320 include serially-connected control sections 328 through 330 and their corresponding data element storage and operation sections 322 through 324. Each section 322 through 324 includes memory for storing an instruction and associated information fields needed to execute the instruction. Data element storage and operation sections 322 through 324 may include, for example, an ALU capable of executing an integer instruction.
The stages of pipeline 340 include serially connected control sections 348 through 350 and corresponding result storage elements 342 through 344. The result storage elements store and transfer data values that result from or are needed for an instruction in one of storage and operation sections 322 through 324.
In operation, an instruction in each stage of pipeline 320 can interact with a result in the corresponding stage of pipeline 340. At each stage information in instruction sections 322 through 324 can be compared with information in corresponding result sections 342 through 344, respectively. If, for example, it is determined that section 322 requires as an operand a result that is in section 342, the result is copied into section 322. On the other hand, information can similarly be copied or transferred in the reverse direction, from the instruction pipeline to the results pipeline, where it will move and be available to interact with following instructions in the instruction pipeline. For correct operation of counterflow pipeline 300 for this and other applications, each result flowing down pipeline 340 must interact with each instruction flowing up pipeline 320. Even though the stage in which this interaction will take place is not known in advance, it is essential that such an interaction must take place in some stage, or an error will occur. Thus, it is required that the control circuits do not permit an instruction in pipeline 320 and a result in pipeline 340 to cross any stage boundary in opposite directions.
Thus, there is a further need and importance for asynchronous control circuitry having tightly coupled control signals which accurately control sophisticated pipelines such as a counterflow pipeline.