The present invention relates to CMOS input cells and, more specifically, to the implementation of a zero hold time CMOS input cell which utilizes a programmable delay line. This input cell is suitable for use in many high speed data bus applications.
An edge triggered D flip-flop (or D flop) is a well known device which captures the logic state of a data input signal on the rising (or falling) edge of a clock input signal. In practice, integrated circuits (ICs) commonly use D flops to capture input data from an external bus.
FIG. 1 shows a simplified circuit diagram which illustrates a portion of a conventional integrated circuit 100. As shown in FIG. 1, circuit 100 includes a D flop 110 which is located in the core of circuit 100, and an input data cell 112. Input data cell 112 includes an input data pin 116, which receives a data input signal DATA from an external data bus 118. Input data pin 116, in turn, supplies the DATA signal to a CMOS/TTL compatible data input buffer 114, whose output directly drives the D input of flop 110.
Similarly, circuit 100 also includes an input clock cell 120. Input clock cell 120 includes a clock input pin 124, which receives a clock input signal CLK from an external source. Clock input pin 124, in turn, supplies the CLK signal to a CMOS/TTL compatible clock input buffer 122, whose output directly drives the CLK input of flop 110.
As shown in FIG. 1, the DATA signal from external data bus 118 must be captured (i.e. latched) by flop 110. In order for this to occur, the specified minimum setup and hold times for flop 110 must be met. In general, these minimum setup and hold times can be positive, negative or zero. Because setup and hold times are signed numbers, they are, by convention, interpreted as follows. For a rising edge triggered flop, a positive setup time indicates that the data on the flop D pin must change state before the clock rises on the flop CLK pin. Conversely, a negative flop setup time allows the data on the flop D pin to change state after the clock rises on the flop CLK pin.
Similarly, for a rising edge triggered flop, a positive hold time indicates that the data on the flop D pin must change state after the clock rises on the flop CLK pin. Conversely, a negative flop hold time allows the data on the flop D pin to change state before the clock rises on the flop CLK pin.
For example, if the specified minimum setup time for flop 110 is +1 ns, flop 110 will capture the correct data if it is presented with a setup time of +1 ns, +2 ns or +3 ns. Flop 110 will not capture the correct data, however, if it is presented with a setup time of xe2x88x921 ns, 0 ns or +0.5 ns.
Similarly, if the specified minimum hold time for flop 110 is xe2x88x920.5 ns, flop 110 will capture the correct data if it is presented with a hold time of xe2x88x920.5 ns, 0 ns or +1 ns. Flop 110 will not capture the correct data, however, if it is presented with a hold time of xe2x88x923 ns, xe2x88x922 ns or xe2x88x921 ns.
From the foregoing examples, it can be seen that the specified minimum setup and hold times for flop 110 will be met if the following statement is true: the setup and hold times presented to flop 110 must be arithmetically greater than or equal to its specified minimum setup and hold times.
One of the problems associated with capturing data from a high speed synchronous data bus, such as the PCI bus, is that data can change state at exactly the same time that the clock rises (assuming a rising edge clock reference). Thus ICs which are connected to high speed synchronous data buses are often required to operate with zero hold time at their data bus input pins, relative to their clock input pin.
Referring to FIG. 1, in order to determine whether or not the minimum setup and hold requirements of flop 110 are being met, the following parameters must be examined: the relative timing of the input signals DATA and CLK, and the delays imposed by data input buffer 114 and clock input buffer 122. These parameters will be examined in the following paragraphs.
As shown in FIG. 1, the signal path to the D input of flop 110 goes through data input buffer 114, which has a relatively low fanout (only one in this PCI bus example). However, the signal path to the CLK input of flop 110 goes through clock input buffer 122, which has a relatively high fanout (49 in this PCI example). Due to this difference in fanout, the load capacitance on data input buffer 114 will be far less than the load capacitance on clock input buffer 122. This difference in load capacitance implies that the delay through data input buffer 114 will be far less than the delay through clock input buffer 122. (Note: In most high speed bus applications it is not possible to speed up the clock input buffer to the point where its delay is less than or equal to the delay through the data input buffer).
From the above discussion it can be seen that the delay from data input pin 116 to the D input of flop 110 will usually be less than the delay from clock input pin 124 to the CLK input of flop 110. Hence, when the clock and data signals have a zero hold time relationship at the chip input pins (i.e. on the bus), the hold time imposed on flop 110 can be highly negative, causing a hold time violation. This hold time violation can cause the wrong bus data to be captured, resulting in a system malfunction.
FIG. 2A shows a timing diagram which illustrates the hold time violation described in the preceding paragraph. In this example it is assumed that flop 110 in FIG. 1 has a specified minimum hold time of xe2x88x920.5 ns. In accordance with the foregoing discussion, it is also assumed that the delay through data input buffer 114 in FIG. 1 is 1 ns, and that the delay through clock buffer 122 in FIG. 1 is 3 ns.
As shown by waveforms A and B in FIG. 2A, the input signals CLK and DATA both change state at exactly the same time (0 ns). Thus the correct data which must be captured by flop 110 in FIG. 1 is designated as xe2x80x98D1xe2x80x99 in FIG. 2A. However, because the delay through clock buffer 122 in FIG. 1 is 3 ns, the CLK pin of flop 110 will not go high until 3 ns, as shown by waveform C in FIG. 2A. Similarly, because the delay through data buffer 114 in FIG. 1 is only 1 ns, the D pin of flop 110 will change state at 1 ns, as shown by waveform D in FIG. 2A. Thus, when comparing waveforms C and D in FIG. 2A, it can be seen that the hold time presented to flop 110 is equal to xe2x88x922 ns. Since the minimum hold time for flop 110 is xe2x88x920.5 ns, flop 110 has a hold violation of 1.5 ns (absolute value). Thus flop 110 will not capture the correct data xe2x80x98D1xe2x80x99; it will instead capture the wrong data xe2x80x98D2xe2x80x99.
The above hold time violation can be corrected by modifying the circuit shown in FIG. 1. Referring to FIG. 3, circuit 300 is similar to circuit 100 shown in FIG. 1, and, as a result, uses the same reference numerals to designate structures which are common to both circuits.
The circuit shown in FIG. 3 illustrates a portion of a conventional integrated circuit 300. As shown in FIG. 3, the hold time violation for flop 110 can be corrected by adding a delay circuit 310 to input data cell 112. Thus, as shown by the waveforms in FIG. 2B, if the added delay is equal to at least 1.5 ns, the hold time violation for flop 110 will disappear (i.e. the imposed hold time will increase from xe2x88x922 ns to xe2x88x920.5 ns).
Referring to FIG. 3, if the added delay 310 is too short, the zero hold time requirement will not be met. Conversely, if the added delay is too long, the flop hold time will be more than sufficient, but the flop setup time may be decreased to the point where a setup violation occurs. (This assumes that the clock period, tCLK, and the maximum logic chain delay, tLOGIC, do not changexe2x80x94i.e. tCLK=tSETUP+tHOLD+tLOGIC, thus tSETUP=tCLKxe2x88x92tLOGICxe2x88x92tHOLD).
Referring to FIG. 3, if data input pin 116 and clock input pin 124 have a zero hold time relationship, the minimum delay required to prevent a hold time violation at flop 110 must satisfy EQ. 1:
tIB+t310xe2x88x92tCLKxe2x89xa7tHOLDxe2x80x83xe2x80x83EQ. 1
where tIB represents the propagation delay through data input buffer 114, t310 represents the minimum required propagation delay through delay circuit 310, tCLK represents the propagation delay through clock input buffer 122, and tHOLD represents the minimum hold time required by flop 110.
EQ. 1 can be solved for t310, the minimum required propagation delay through delay circuit 310, as follows:
t310xe2x89xa7tHOLD+tCLKxe2x88x92TIBxe2x80x83xe2x80x83EQ. 2
As shown in EQ. 2, for the special case where the required flop hold time tHOLD is equal to zero, the added data delay t310 must be greater than or equal to the clock buffer delay tCLK minus the data input buffer delay tIB.
Referring to FIG. 1, conventional integrated circuits, such as circuit 100, can include wide synchronous data busses 118 containing up to 128 data bits. These wide data buses must drive many input data buffers 114 which, in turn, must drive many input data flops 110. Since the input data flops are usually located in the core, different input data flops can be located at different wire distances from their associated input data buffers. These varying wire distances can cause differences in the load capacitances presented to each input data buffer, resulting in data delay differences to each input data flop. These data delay differences, in turn, can cause some input data flops to have inadequate hold time. In order to prevent this from happening, a data delay and a data latch can be employed, as shown in FIG. 4.
FIG. 4 illustrates a portion of a conventional integrated circuit 400. Circuit 400 is similar to circuit 300 and, as a result, utilizes the same reference numerals to designate the structures which are common to both circuits.
As shown in FIG. 4, circuit 400 differs from circuit 300 in that input data cell 112 of circuit 400 includes a delay latch 410 which is connected between delay circuit 310 and flop 110. Delay latch 410 differs from flop 110 in that it is level triggered rather than edge triggered. Thus latch 410 passes the logic state on its data input pin D to its data output pin Q, while the clock is low. Conversely, latch 410 holds (i.e. latches) the logic state on its data input pin D at its data output pin Q, when the clock goes high.
During normal operation, latch 410 simply retains the xe2x80x98oldxe2x80x99 data, which was valid before the rising clock edge, before it is destroyed by the xe2x80x98newxe2x80x99 data, which is valid after the rising clock edge. Thus the data output Q of latch 410 is retained for an entire clock cycle.
Referring to FIG. 4, the main advantage of including delay latch 410 inside input data cell 112 is that it allows the required data delay 310 to be minimized. This increases the maximum allowable operating frequency. Thus, when CLK and DATA have a zero hold time relationship at the IC clock/data pins, and delay latch 410 is included inside input data cell 112, delay circuit 310 only needs to compensate for the difference in clock/data delay through clock buffer 122 and input buffer 114. Delay circuit 310 does not have to compensate for data delay differences caused by varying wire lengths to different data flops 110 located in the core.
Referring to FIG. 4, in most applications the amount of delay provided by delay circuit 310 can be made the same for all data input cells 112. However, in very high speed bus applications (where the relative clock/data arrival times significantly vary from one data input cell 112 to the next), delay 310 may have to be adjusted on a cell-by-cell basis.
Circuits 100, 300, and 400 assume that the on-chip clock is being generated by a clock input buffer which is directly driven from the bus clock pin. However, for large complex chips, this xe2x80x98flatxe2x80x99 clock buffer approach is often impractical. Thus on-chip clocks are often generated by multi-stage, hierarchical clock trees. A simple example of a multi-stage, hierarchical clock tree 600 is shown in FIG. 5.
Hierarchical clock trees usually produce smaller clock skewxe2x80x94ideally zero, allowing a higher maximum operating frequency. However, although the hierarchical clock tree approach minimizes clock skew, it accomplishes this at the expense of increased clock latency (clock delay). As shown in FIG. 5, this additional latency occurs because the clock signal must pass through additional logic levels before it finally reaches the clock input of an internal data flop.
Since the hierarchical clock tree approach increases clock delay, it also requires a corresponding increase in data delay, so that the zero hold time constraint present at the IC clock/data pins can be met. As explained below, a major problem with prior art delay circuits is that they often cannot provide this extra data delay without introducing data errors.
FIG. 6 illustrates a conventional delay stage 700 which can be used to implement delay circuit 310. Delay stage 700 delays the data signal, as required, by utilizing RC (resistor/capacitor) values which slow down the rise/fall times of the data signal. Thus, when a moderate to large data delay is needed, the rise/fall times produced by the RC stage will be slow.
Since delay circuit 700 slows down the rise/fall times of the data signal, at least one non-inverting logic buffer is often required in order to xe2x80x98square upxe2x80x99 (i.e. speed up) the slow edge rates. As shown in FIG. 6, a non-inverting delay circuit is implemented by utilizing an RC delay stage followed by buffer inverters 710 and 720.
FIG. 7 shows a circuit diagram which illustrates another conventional delay stage 800 which can be utilized to implement delay circuit 310. As shown in FIG. 7, delay stage 800 utilizes a high impedance CMOS inverter stage 810 and a load capacitance C which is connected to stage 810.
Delay stage 800 is somewhat superior to delay stage 700 because the CMOS inverter stage 810 effectively creates timing resistors which are more closely correlated to the process/voltage/temperature (PVT) variations which can occur. Nevertheless, when a moderate to large data delay is required, the rise/fall times produced by CMOS inverter stage 810 will also be slow. As a result, one or more buffers are needed in order to square up these slow edge rates. Thus, as shown in FIG. 7, a single non-inverting delay circuit has been implemented by employing inverters 810 and 811.
As described above, the single RC delay stage employed in the prior art produces slow edge rates in order to provide the data delay which is required. Even though these slow edge rates are eventually squared up, the single RC stage nevertheless imposes a serious limitation on the amount of data delay which can be obtained. This limitation is discussed in the following paragraphs.
The data delay implementations discussed above suffer from the same critical limitation: the maximum delay which can be obtained is limited to only a small fraction of the minimum data period. (The minimum data period, or maximum data frequency, occurs when the data changes state as often as possible on the data bus).
The reason for the above timing limitation is that, at the highest data rate, the RC voltage waveform must have sufficient time to rise from 0V to a value close to VCC. Similarly, at the highest data rate, the RC voltage waveform must also have sufficient time to fall from VCC to a value close to 0V. If these two conditions are not met, the data delay will vary with the data rate. This delay variation will cause the provided setup and hold times to vary, resulting in circuit timing failures which are dependent upon the data rate.
For example, assuming VCC=5V, when the bus data changes state at a low data rate (i.e. infrequently), the delayed RC voltage waveform will have plenty of time to make rising/falling transitions between 0V and 5V. However, when the bus data changes state as often as possible, the RC voltage waveform may only be able to make transitions between 1.5V and 3.5V. Thus, when bus data transitions occur relatively infrequently, the data delay will be large. Conversely, when bus data transitions occur as often as possible, the data delay will be small. As stated in the preceding paragraphs, these data dependent delay variations can easily cause hold time and/or setup time violations to occur, resulting in circuit timing failures.
In the paragraphs which follow, it will be shown that, using the prior art delay circuits discussed above, the data delay will be limited to only a small fraction of the minimum data period. This limitation is required in order to insure that there is always sufficient time for the delayed data waveform to make full transitions between 0V and VCC.
FIG. 8 shows two data waveforms, waveform 910 and waveform 920. Waveform 910 is a delayed data waveform produced by an RC delay circuit similar to those discussed in the preceding paragraphs. Waveform 920 is a delayed (xe2x80x98squared upxe2x80x99) version of waveform 910. As shown in FIG. 8, the delayed waveform 920 is retarded in time by an amount equal to td, where td is the required data delay time.
It is assumed that the delayed waveform 920 in FIG. 8 has been squared up by logic gates whose xe2x80x98trip pointsxe2x80x99 are centered around VCC/2. Thus the phase relationship between waveform 910 and waveform 920 will be as shown in FIG. 8.
Waveform 910 and waveform 920 both have the same data period. Thus, in most timing-critical applications, it is sufficient to assume that this data period contains 6 RC time constants: 3 for the rising portion of waveform 910, and 3 for the falling portion of waveform 910. The voltage V for the rising portion of the RC waveform 910 is defined by EQ. 1:
V=VCC(1xe2x88x92exe2x88x92t/xcfx84)xe2x80x83xe2x80x83EQ. 1
where VCC represents the power supply voltage, t represents time, and xcfx84 represents the RC time constant.
The delay time required for RC waveform 910 to go from zero volts to VCC/2 volts (or from VCC/2 volts to zero volts), is defined by EQ. 2:
VCC/2=VCC(1xe2x88x92exe2x88x92td/xcfx84)xe2x80x83xe2x80x83EQ. 2
where td represents the delay time.
Solving EQ. 2 for xcfx84 yields xcfx84=td/0.693. Since TMIN, which represents the minimum data period, is equal to 6xcfx84, TMIN is defined by EQ. 3:
TMIN=8.65td.xe2x80x83xe2x80x83EQ. 3
EQ. 3 indicates that the minimum data period TMIN, and the required data delay td, are directly related by the simple equation: TMIN=8.65td. To put it another way, the required data delay td is limited to only 12% (1/8.65) of the minimum data period TMIN.
The above limitation is an extremely serious one for high speed data buses. For example, a 50 Mhz data bus can change state as often as every 20 ns (TMIN=20 ns). Thus the maximum allowable data delay, td, is equal to only 2.3 ns. If there is a zero hold time constraint at the IC clock/data pins, and a hierarchical clock tree is being used, the maximum clock latency must not exceed 2.3 ns, else a hold time violation will occur. In many IC applications the hierarchical clock tree latency can easily exceed 2.3 ns; thus the prior art circuitry cannot be used to generate an acceptable data delay. Of course, for a very fast 100 Mhz data rate (TMIN=10 ns), the data delay problem gets even worse.
For ICs operating under a zero hold time constraint at their clock/data pins, the minimum data delay required to guarantee correct circuit operation is usually determined under fast PVT (process/voltage/ temperature) conditions. Thus, if more than the minimum required data delay is provided at fast PVT, this additional (unneeded) data delay will be increased by approximately two to three times at slow PVT. This 2xc3x97-3xc3x97 increase in data delay can make it extremely difficult to provide adequate setup time under slow PVT conditions. Thus, in most high speed data applications, it is extremely important to provide only the minimum amount of data delay required to barely meet the zero hold time constraint at fast PVT.
In order to meet the zero hold time constraint imposed at the clock/data pins of an IC, RC data delay circuits are conventionally employed. These RC data delay circuits are used to delay the incoming data signals received from an external data bus.
As described in the preceding paragraphs, the maximum data delay that can be obtained from a conventional RC data delay circuit is severely limited. This limitation exists because the delayed data signal must be allowed to rise to a voltage level close to VCC, and must be allowed to fall to a voltage level close to ground. These xe2x80x98completexe2x80x99 voltage excursions between VCC and ground are required under all conditions, even when the bus data changes state at the highest possible frequency. Failure to make xe2x80x98completexe2x80x99 voltage excursions between VCC and ground will result in data delays which vary with the data rate, causing hold time violations to occur.
The present invention solves the aforementioned problem by utilizing a series of data delay stages which provide the equivalent of a programmable data delay line. This programmable delay line provides the total data delay required to operate an IC under a zero hold time constraint at its clock/data pins.
Since each stage in the programmable delay line provides only a fraction of the total data delay required, the rise/fall time of each delay stage can be very fast. This allows the output voltage waveform of each delay stage to make xe2x80x98completexe2x80x99 excursions between ground and VCC, assuring a data delay which is independent of the data rate. Thus, by choosing the appropriate number of delay stages to be employed, any amount of data delay can be obtained at any data rate.
An input data cell, in accordance with the present invention, contains a data pad and a data input buffer which is connected to the data pad. The data input buffer drives a delay circuit which, in turn, drives an optional delay latch. The delay latch drives logic flip-flops located in the IC core.
In the present invention, the delay circuit contains a plurality of delay stages which have a corresponding plurality of outputs. In operation, the delay provided by the delay circuit is xe2x80x9cprogrammedxe2x80x9d by selecting one of the delay circuit outputs and connecting it to the D input of an optional delay latch.
A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description and accompanying drawings which set forth an illustrative embodiment in which the principals of the invention are utilized.