The present invention relates to a digital delay line structure with a synchronous single clock domain control.
In order to illustrate why a digital delay line is required in a device, an example of the traditional clocking methods is provided. FIG. 1 shows a simplified schematic of a device using clock tree synthesis. Data is input on pin 11 while clock signals are input on pin 13. The input signal is applied to pad 10 while the clock input is applied to pad 12. The set-up and hold delay is represented by circuit elements 14 which are applied to the D input of the flip-flop 18. The clock tree 16 has an output which is applied to the other input of the flip-flop 18. For devices using super-clock buffers, the clock tree may be replaced with the super clock buffers. Test logic such as boundary scan has not been considered in this analysis, but may be lumped into the modeled delays.
The lumped delays in FIG. 1 are analyzed to show the effect of the clock tree. As shown by the timing diagram of FIG. 2, the output propagation delay is determined by the input pad delay, the clock tree delay, the flop propagation delay and the output pad delay. For large designs where the clock tree delay is over 4 or 5 ns, the clock tree will dominate the output propagation delay. While custom clock trees, such as using a separate clock tree to the output flip-flop/flops, may help the problem, the tree may still be big enough to significantly affect the output propagation delay.
Since the clock tree delays the clock to the flip-flop sampling the input data, the input data must also be delayed in order to achieve reasonable input setup and hold performance. The delay may be test or functional logic or may be delay chains formed using a string of buffers. In some cases, the propagation difference between a clock input pad and a data input pad requires the input data setup and hold specifications to vary significantly over process voltage and temperature as precise matching of the delays is impossible.
As devices increase in complexity, the clock tree increases in size and latency. While the input setup and hold specifications can be adjusted by increasing the delay on the data inputs, the output propagation delay increases. For extremely large devices with thousands of flip-flops, the clock tree delay may prevent reasonable output propagation delays for high-speed interfaces (for example, the SUNI-622 device manufactured by PMC-Sierra, Inc. of Burnaby, B.C., Canada has an interface with an output propagation close to the clock period). One solution is to use a custom clock tree with high-speed output flip-flops operating on a separate small clock tree. While this solution has been used (for example, in the SUNI-QJET also manufactured by PMC-Sierra, Inc.), devices with a large number of high-speed output flip-flops will still have problems with clock tree latency.
One solution is to use a digital delay locked loop (DLL) 24 as seen in FIG. 3. In this case the DLL 24 has a SYSCLK input coupled to an output of the clock input pad 12 and a REFCLK (reference clock) input taken from an output of clock tree 26 which is also applied to the input to flip-flop 18. The DLL generates an internal clock DLLCLK based on the incoming SYSCLK clock input. Since the REFCLK input is connected to the output of the clock tree, the DLLCLK clock output is adjusted until the SYSCLK input and the REFCLK input align. As shown in the timing diagram in FIG. 4 a rising edge from the clock tree 26 coincides with the rising edge of SYSCLK from the clock input pad 12. The output propagation specification is now comprised of the clock input pad delay, the flop delay, the output pad delay and the DLL clock uncertainty.
A digital delay locked loop architecture is shown in FIG. 5. In this case the SYSCLK input is coupled to an adjustable delay line 34 and to a phase detector 30. Phase detector 30 also has a REFCLK input. The output of the phase detector 30 is directed to a control state machine 32 which directs the amount of delay to be implemented by adjustable delay line 34 in response to the phase difference between SYSCLK and REFCLK. Since the output clock DLLCLK is the same frequency as the system clock SYSCLK, the DLLCLK may be a phase delayed version of the SYSCLK input. A variable delay line controlled by the phase detector produces the required delay to generate the DLLCLK. The control state machine performs many tasks such as filtering the phase detector information and producing status/error control signals for monitoring purposes.
An adjustable delay line has been implemented in many ways. One way is shown in FIG. 6 in which a chain 33 of buffers 36, 38, 40, etc. forms the delay line with taps taken from the input and at the output of each of the buffers 36, 38, 40, etc. The buffers 36, 38, 40, etc. provide a series of phase delayed copies of the input clock. An output multiplexer 42 selects the desired phase delay from the buffer chain 33. While the chain 33 of buffers 36, 38, 40, etc. may be easily implemented, the multiplexer 42 is very hard to design as the multiplexer 42 must be able to switch between two clock phases without the output changing at a time that is synchronous with the input clock but with unknown phase delay from the input clock (hereinafter referred to as xe2x80x9cglitchingxe2x80x9d). Most multiplexer implementations use either AND-OR tree logic or pass transistor logic. However, the phase selection must be changed with specific timing in order not to cause a glitch in the output. Usually, local control of the multiplexing function (e.g. a D flip-flop and control logic) is required for each buffer 36, 38, 40, etc. or group of buffers to ensure the output clock does not glitch.
Another common method of implementing a delay line is shown in FIG. 7. In this case series connected buffers 58 each have a PMOS FET 56 and an NMOS FET 60 in their power supply circuits. The current through PMOS FETs 56 are controlled by a voltage VcntrlP on input 50 and through the NMOS FETs 60 by a related voltage VcontrlN on input line 54. In this case, the delay of the delay line is adjusted using control currents into the respective buffers 58. By limiting the current the buffers can draw from the power supply, the delay through the buffer is related to the control current and the capacitance on the buffer""s output. While this implementation is very elegant, it requires some analog design for the current mirrors and control voltage generation. While it does not allow for delay jumping, the analog control voltages allow for very precise delay control.
Many other implementations of delay lines exist, but such implementations tend to contain a combination of elements of FIG. 6 and FIG. 7. For example, an implementation may use the analog delay line in FIG. 7, but tune the total delay through the chain to be one clock period. This will produce N equally spaced clock phases, one for each delay stage, which can be selected using the multiplexer in FIG. 6. Another example may use multiple stages of the delay line in FIG. 6 to produce a delay line with coarse and fine adjustment control buses.
Most digital approaches seen in publications have a basic structure: the input of the delay line with one clock phase, the output of the delay line with another phase and the control with a third phase. The problem with such an architecture is that three clock domains exist (input, output and control) with the same frequency, but with different phase offsets. In reality, only the input clock domain is important as it controls all logic. All other clock domains are derived (such as the output clock domain) or artificial (such as the control clock domain). There is a need for reduction of the number of clock domains to two.
Accordingly, it is an object of the invention to provide a delay line that is easily controlled using a digital control bus updated at a constant and known phase offset, preferably zero, to the controlling state machine clock and input clock.
It is a further object of the invention to provide a delay line which can be constructed using standard ASIC library elements. It is yet another object of the invention to provide a delay line which is not sensitive to the library cells"" asymmetric drive strengths and takes advantage of digital ASIC design flows such as clock tree synthesis and digital place and route layout automation.
Another object of the invention is to provide a delay line which is relatively insensitive to the layout of the logic on the chip. Using clock tree synthesis to provide a global clock to all logic in the delay line would provide such insensitivity.
Finally it is a further object of the invention to provide a delay line implementation which allows multiple clock period movement to be performed (useful for data recovery type functions).
According to the invention there is provided a digital delay line, which includes a plurality of multiplexer delay elements, arranged in sequence with each of the plurality of multiplexer delay elements having an associated control input. A clock signal line is coupled to a clock input of each of the plurality of multiplexers and is operative to provide synchronous, phase aligned clock signals from a clock signal source to each of said clock inputs. A control input is coupled to each of the plurality of multiplexer delay elements and is operative to transmit to each of the plurality of multiplexer delay elements an associated control signal. In response to a first change in the control signal an associated delay element is added to a start of the delay line and in response to a second change the delay element is removed from a start of the delay line.
A phase detector may be coupled to an input of a selected delay element in the delay line, and be operative to indicate when an input clock has the same phase as a delay input signal after having passed through the delay elements positioned prior to the selected delay element.
Each multiplexer may have two gates, each with two inputs, a signal input of the inputs to each of the input gates coupled to a signal source, an output of each of the input gates coupled to respective inputs of an output gate, and a logic circuit having an input coupled to a multiplexer select input and an output coupled to the signal inputs of the two input gates. The logic circuit may be operative to cause the multiplexer to select one of the signal inputs when the multiplexer select input is low and to select another of the inputs when the multiplexer select input is high.
Preferably, each of the gates is selected from the group consisting of a NAND gate and a NOR gate.
The logic circuit may include a multiplexer select NAND gate having a multiplexer select input and a fixed input held high, a mux NAND gate having one input coupled to an output of the multiplexer select NAND gate and also to an input of one of the two input NAND gates. An output of the mux NAND gate may be coupled to an input of another of the two input NAND gates.
A clock tree may be coupled to a clock source and be operative to provide clock signals to the plurality of multiplexer delay elements.
A driver may be coupled to a multiplexer select input of the multiplexer select NAND gate of each of the multiplexers and a control bus coupled to an input of the driver.
Advantageously, the driver may be a flip-flop circuit.
The phase detector may be coupled across clock and signal inputs to a last one of the delay elements in the digital delay line.
In another aspect of the invention there is provided a method of establishing a digital delay line, comprising forming a sequence of digital delay elements, coupling a control bus to each of the delay elements, control signals on the control bus being operative to control the insertion or deletion of associated delay elements from a start of the delay line. An input clock may be coupled to each of the digital delay elements by using an input clock bus. The delay elements may be inserted or deleted at the input side of the sequence of delay elements.
Preferably, the digital delay elements are multiplexers, each formed from a combination of logic gates.