1. Field of the Invention
The present invention is related to computer systems. More specifically, the present invention is related to an apparatus and a method for phase-buffering signals on a bit-by-bit basis using control queues.
2. Related Art
As computer systems become increasingly faster, signal propagation delays are becoming more critical. If propagation delays for data signals are not properly matched with corresponding clock signals, signals sent from one sub-unit of a computer system to another sub-unit may not have the correct timing in relation to the clock signal at the receiving end. This is true even though both the sending clock signal and the receiving clock signal are derived from the same clock source. The clock signals going to the source and destination travel through different paths and may incur different delays, which can cause skew in the data signals relative to the receiving clock signal. Reference is made here to the difference in arrival time of the clock signals at the transmit end and the receiving end as the phase-delay of the clock. This phase delay can be positive or negative. Note that temperature and voltage differences may affect the timing of the data and clock signals. Furthermore, coupling and interference from signals on adjacent wires may also affect the timing of the data and clock signals.
FIG. 1 illustrates a phase buffer 106 interposed between a transmitter module 104 and a receiver module 108. Transmitter module 104 and receiver module 108 receive a clock signal from clock source 102. The clock signal received at transmitter module 104 is delayed by delay A 114, while the clock signal received at receiver module 108 is delayed by delay B 116. Delay A 114 and delay B 116 are typically different from each other and are due to various causes, such as different components in their propagation paths, different temperatures of these components and different supply voltages of these components. Additionally, transmitter module 104 and receiver module 108 may be located in different integrated circuits. Note that delay A 114 and delay B 116 may include delays within transmitter module 104 and receiver module 108, respectively.
During operation, transmitter module 104 sends data 110 for consumption by receiver module 108. However, due to the differences in delay A 114 and delay B 116, transmitter module 104 cannot send data 110 directly to receiver module 108. Instead, transmitter module sends data 110 to phase buffer 106 using TX clock 118, which is derived from the clock signal delayed by delay A 114. Phase buffer 106 stores data 110 until such time as it is delivered to receiver module 108 as data 112. Data 112 is delivered to receiver module 108 using RX clock 120, which is derived from the clock signal delayed by delay B 116.
FIG. 2A illustrates a typical 3-stage flow-through FIFO. The upper portion of FIG. 2A depicts the control path, while the lower portion of FIG. 2A depicts the data path. Note that the data path may be a bus carrying several bits of data between data in 210 and data out 212.
In an asynchronous FIFO, typically control signals are bundled with the data bits, as shown in FIG. 2A. The forward-going “ready” control signal, often called “request,” announces that the data are valid. Thus, there is a bundling delay constraint that the delay of the forward-going control signal must be the same as or greater than the delay of the forward-going data. The control must not announce that “the data are valid” when the data are not valid yet. Thus, care must be taken in the circuit design and chip layout to ensure that this bundling constraint is met.
The control path includes control stages 201-203 and the data path includes pass-gates 204-206 and sticky buffers 207-209. Sticky buffers 207-209 are described below in conjunction with FIG. 2B. Prior to data being stored within the FIFO, ready signals 222, 224, 226, and 228 and acknowledge signals 214, 216, 218, and 220 are all inactive. The combination of pass gate 204 and sticky buffer 207 form a latch. The same is true for the combination of pass gate 205 and sticky buffer 208, and the combination of pass gate 206 and sticky buffer 209.
When data becomes available at data in 210, the data source brings ready signal 222 to an active state. In response, control stage 201 causes pass-gate 204 to momentarily open and provide data in 210 to sticky buffer 207. Sticky buffer 207 holds the received state of data in 210 until control stage 201 is cycled again. Control stage 201 also activates ready signal 224 which passes to control stage 202 and acknowledge signal 220 to the data source. Note that acknowledge signal 220 and the control signal to pass-gate 204 both are designed to have an inherent delay, which allows data in 210 to be captured before the corresponding acknowledge signal 220 is set to active.
At a future time, acknowledge signal 218 from control stage 202 is activated, which causes control stage 201 to deactivate acknowledge signal 220 and to deactivate ready signal 224. This returns the first stage of the FIFO to its initial state. The second and third stages of the FIFO operate in a similar manner. For proper operation, the delay in a given control stage must be longer than the delay in the associated data stage. This is referred to as the control-to-data bundling constraint. Note that there can be more or fewer stages in the FIFO than is shown in FIG. 2A. Such a flow-through FIFO functions as a phase-buffer between clocked modules because each stage of the FIFO operates in less than one clock cycle. The difference between the delay of one stage of FIFO and the clock period is the amount of phase buffering each stage of the FIFO can provide.
FIG. 2B illustrates an exemplary sticky buffer 207. Sticky buffer 207 includes inverters 250 and 252 coupled back-to back, which forms a latch. Note that inverter 252 is smaller than inverter 250, thus the output of inverter 252 can be easily overridden by an external input to inverter 250. Inverter 254 restores the output polarity of sticky buffer 207 to match the input polarity of sticky buffer 207.
Exemplary prior art solutions are described by William J. Dally and John W. Poulton in Digital Systems Engineering, Cambridge University Press, 1998, pages 470-485. Note that on page 475, the authors describe the task of phase-buffering as “synchronizing a mesochronous signal.” The task is not really “synchronizing,” but the circuits used are very similar to those used when synchronization is carried out, thus the authors included them in the synchronizer design section. Notice that the FIFO designs shown in this text are ring-buffer FIFOs rather than the flow-through FIFOs. Either style will work.
Determining the proper depth for the FIFO is critical. The FIFO must have sufficient stages to ensure that the FIFO will not overflow, causing loss of data, but must not have too many stages so that data is needlessly held in the FIFO. Furthermore, if the FIFO has too few stages, it is possible to inadvertently empty it thus causing erroneous data to be read by the receiver. Thus, a phase buffer circuit design that offers more phase buffering per stage will have the advantage that it will require fewer stages to achieve a specified amount of phase buffering.
Use of a FIFO as a phase buffer has several drawbacks. The primary issue is managing the control-to-data bundling constraint. Extra delay margin must be added into the forward control path of the FIFO to guarantee that the bundling constraint is satisfied over all operating conditions. This extra delay slows down each FIFO stage, reducing its effectiveness at phase buffering and thus more stages will be required to achieve the specified amount of phase buffering.
Another issue with the use of FIFOs occurs when the input data bits are spatially distributed, which occurs, for example, when the input data comes from input pins of a chip. This can result in running long wires to bring all the bits together to deliver the data to the FIFO, thus adding delay to each data bit proportional to the length of its wire. This additional delay subtracts from the phase buffering capability of the first stage of the FIFO. Alternatively, multiple 1-bit wide FIFOs can be used, one FIFO for each input data bit. This can reduce the wire lengths, and thus reduce additional delays. However, although flow-through FIFO stages, such as shown in FIG. 2A, are small, use of multiple 1-bit FIFOs also duplicates the control circuits.
Hence, what is needed is an apparatus and a method for phase-buffering of signals between a source and a destination within a computer system, which do not have the problems described above.