This invention relates to deskewing of clock signals, and more particularly to deskewing of clock signals for off-chip devices.
FIG. 1 (Prior Art) is a simplified top-down diagram of a field programmable gate array (FPGA) integrated circuit 1. Integrated circuit 1 includes a ring of interface cells 2 (sometimes called xe2x80x9cI/O cellsxe2x80x9d) and an inner core of configurable logic blocks (not shown). Each configurable logic block (CLB) contains one or more sequential logic elements. These sequential logic elements are represented in FIG. 1 as flip-flops 3. A clock signal that is received from an off-chip source is typically routed via a clock bus to numerous clock inputs of the flip-flops so that all the flip-flops are clocked together. To prevent a given clock edge from being received at one of the flip-flops before it is received at another (this is called xe2x80x9cclock skewxe2x80x9d), a structure called a xe2x80x9cbalanced clock treexe2x80x9d is employed.
FIG. 2 (Prior Art) is a simplified diagram illustrating a balanced clock tree 4. The distance a clock signal must travel from the clock source at point CS to any of the points A-H at the respective clock inputs of flip-flops 5-12 is identical. Assuming equal propagation speeds through the various branches of this balanced clock tree, the clock signal at point CS will reach points A-H at the same time. In the structure of FIG. 2 there will, however, be a propagation delay between the time a given clock edge arrives at point CS and the time when that clock edge arrives at points A-H. Where a clock signal edge is received onto an FPGA from an external source, such a propagation delay may introduce undesirable clock skew between the clock signal edge where it enters the FPGA and the clock signal edge at the clock inputs of the various flip-flops inside the integrated circuit. A circuit called a xe2x80x9cdelay-locked loopxe2x80x9d (DLL) may be employed to reduce such clock skew.
FIG. 3 Prior Art) is a simplified diagram of FPGA 1 that uses a xe2x80x9cdelay-locked loopxe2x80x9d (DLL) 13 to eliminate such clock skew. An external clock signal CLKIN is received onto FPGA 1 via a clock input buffer 14 and is supplied to a reference signal input 15 of DLL 13. DLL 13 has a feedback signal input 16 which is coupled to the clock input of flip-flop 5 by a short connection 17. DLL 13 delays the clock signal output by the DLL on DLL output 18 such that the phase of the clock signal at clock feedback input 16 matches the phase of the clock signal at clock input 15. The connection 17 from the clock input of flip-flop 5 to the feedback signal input 16 is made to have the same delay as the delay through clock input buffer 14 to reference signal input 15. Accordingly, the phase of the clock signal at the clock inputs of flip-flops 5-12 matches the phase of the clock signal where CLKIN is received onto FPGA 1 at the input of clock input buffer 14. The clock signal at the clock inputs of flip-flops 5-12 is therefore said to be xe2x80x9cdeskewedxe2x80x9d with respect to the external clock signal CLKIN. For additional background information on DLLs and/or their uses in FPGAs, see: 1) U.S. patent application Ser. No. 09/102,740, entitled xe2x80x9cDelay Lock Loop With Clock Phase Shifterxe2x80x9d, filed Jun. 22, 1998, by Hassoun et al.; 2) U.S. patent application Ser. No. 09/363,941, entitled xe2x80x9cProgrammable Logic Device With Delay-Locked Loopxe2x80x9d, filed Jul. 29, 1999, by Schultz et al.; and 3) U.S. Pat. No. 5,646,564 (the content of these three documents is incorporated herein by reference).
An FPGA may be used to drive another integrated circuit in a synchronous fashion. FIG. 4 (Prior Art) is a diagram of an implementation wherein FPGA 1 is configured to realize RAM control circuitry 20 for interfacing with an external Random Access Memory (RAM) integrated circuit 21. FIG. 5 (Prior Art) is a waveform diagram representative of signals associated with the reading of information from RAM 21.
RAM control circuitry 20 is synchronous logic realized using flip-flops inside FPGA 1. The internal clock signal that clocks these flip-flops is deskewed with respect to the external clock signal CLKIN using DLL 13 as described above in connection with FIG. 3. RAM control circuitry 20 also supplies the clock signal CLK to external RAM 21 via an interface cell 23 and an external clock line 24. To read data from a given memory location, RAM control circuitry 20 outputs the address ADDR of the memory location to be read via interface cells 25 and external address bus lines 26. (The single interface cell 25 RAM in FIG. 4 represents a plurality of interface cells that drives the address bus lines 26.) RAM control circuitry 20 also outputs a control signal CONTROL via interface cell 27 and line 28. Control signal CONTROL indicates that the operation is a read operation as opposed to a write operation.
RAM 21 examines the address ADDR and the control signal CONTROL on a rising edge 29 of the clock signal CLK. If the operation is a read operation, RAM 21 supplies the requested data back to the FBGA 1 via data bus lines 30. The RAM 21 therefore requires that the control signal CONTROL be valid at RAM 21 a given setup time before the rising edge 29 of the clock signal and remain stable a given hold time after the rising edge 29.
Because RAM control circuitry 20 is synchronous logic, clock edge 31 triggers the output of the control signal CONTROL. There is delay associated with producing and conducting this control signal to RAM 21. That delay results in control signal CONTROL arriving at RAM 21 a given time later at time 32. Similarly, clock edge 33 causes the RAM control circuitry 20 to remove the control signal CONTROL. It is removed a given time after clock edge 33 at time 34. As seen in FIG. 5, increasing the propagation delay of the clock signal between FPGA 1 and RAM 21 serves to delay the clock signal CLK AT RAM. Delaying the clock signal CLK AT RAM results in a decreased hold time 35. If this hold time 35 is too short, then the hold time required by the RAM 21 will be violated.
FIG. 6 (Prior Art) is a diagram of one conventional solution wherein a second DLL 36 deskews the clock signal CLK at point 38 on RAM 21. The connection from point 37 to point 38 and the connection from point 37 to point 39 are fashioned to have the same propagation delays. DLL 36 therefore delays the clock signal CLK such that the phase of the clock signal CLK at point 38 matches the phase of the clock signal at point 39. Because the propagation delays through the two input buffers leading into DLL 36 are the same, the phase of the clock signal CLK at point 38 matches the phase of the external clock signal CLKIN where it enters FPGA 1 at point 40.
The bottom waveform CLK AT RAM (WITH DLL) in FIG. 5 illustrates the clock signal CLK at point 38. Note that the phase of this clock signal CLK AT RAM in the bottom waveform matches the phase of the external clock signal CLKIN in the top waveform. Because the clock skew between clock signals at points 40 and 38 is eliminated, the hold time 41 between the rising edge 42 of the clock signal at RAM 21 and the control signal CONTROL is increased.
It may also be desired that such an FPGA interface with more than one external device in synchronous fashion. FIG. 7 (Prior Art) is a diagram of one conventional technique. The second DLL 36 deskews the clock signal CLK at the various external devices 56-59 with respect to the external clock signal CLKIN at point 40 as in the example of FIG. 6. The circuit of FIG. 7, however, employs a balanced clock tree so that the clock signal CLK from point 37 reaches the clock inputs 44-47 of all the RAM chips 56-59 at the same time. As in the example of FIG. 6, the propagation delay from point 37 to point 44 is made to match the propagation delay from point 37 to point 39. Because the delays through the two clock input buffers leading into DLL 36 are the same, the clock signal CLK at all the clocks inputs 44-47 is deskewed with respect to the external clock signal CLKIN at point 40. For additional background information on board level deskewing of a clock signal supplied to multiple external devices, see: Xilinx Application Note XAPP132, version 1.4, entitled xe2x80x9cUsing The Virtex Delay-Locked Loopxe2x80x9d, pages 1-9 (Oct. 11, 1999).
There are, however, drawbacks associated with the structure of FIG. 7. First, consider a situation in which output buffer 48 is an output buffer whose size and current drive capability are fixed at the time of FPGA manufacture. The current drive capability of such a buffer may, for example, be selected for driving a particular standard load. If, for example, such an output buffer 48 is sized to drive a much larger load than it is actually driving in a particular implementation, then the output buffer 48 may drive the clock signal CLK with such a high edge rate that undesirable ringing results. If, on the other hand, this output buffer 48 is sized to drive a much smaller load than it is actually driving in a particular implementation, then output buffer 48 may not be able to drive the clock signal CLK with acceptably rapid edge rates. The resulting slow edge rates may cause increased power consumption and other problems.
Second, using the structure of FIG. 7 involves the undesirable task of designing a balanced clock tree. Where FPGA 1 and the external devices 56-59 being driven are disposed on a multi-layer printed circuit board involving many crossing lines and multiple different trace widths and varying feedthrough via characteristics, design of a suitable balanced clock tree can be a time-consuming and complex task. Traces may have to be made to snake around in order to increase propagation delay, thereby wasting space on the printed circuit board. The serpentine shape 50 of the trace in FIG. 7 illustrates such wasted space.
FIG. 8 (Prior Art) illustrates one conventional design that addresses the problem of output buffer 48 being overloaded. In the example of FIG. 8, an external clock driver integrated circuit 55 is employed. Such a clock driver chip typically presents only an ordinary load on output buffer 48 but has multiple output drivers for driving many clock inputs.
The solution of FIG. 8, however, involves problems. Although the clock driver chip 55 reduces loading on output buffer 48, providing the additional clock driver chip entails the usual costs and complexities associated with adding an additional component to a design. These include increased cost, increased board area, reduced reliability, and increased power consumption. Furthermore, the data buses from external RAM devices 56-59 back to FPGA 1 may have to cross clock traces. Such crossing is represented in FIG. 8 where a data bus 60 between RAM 58 and FPGA 1 crosses clock traces 51 and 52. To prevent undesirable crosstalk and coupling problems, it is desirable that the data buses not cross the clock traces.
Accordingly, a solution is desired wherein: a single FPGA design is adapted to drive different external clock loads in different board level implementations; board level implementations of the FPGA do not involve designing complex balanced clock trees; external clock driver chips are not required; and/or clock lines and data lines leading to external chips do not cross one another.
An integrated circuit (for example, a field programmable gate array) receives an external clock signal and generates therefrom a clock signal that is supplied to a plurality of external devices. These external devices may be devices that are coupled to the integrated circuit via synchronous communication.
A delay-locked loop (DLL), a balanced clock tree, and a plurality of interface cells on the integrated circuit function together to supply the clock signal to the plurality of external devices such that the clock signal at each of the external devices is deskewed with respect to the external clock signal. The DLL has a reference signal input, a feedback signal input, and an output. The reference signal input is coupled to receive the external clock signal from a source external to the integrated circuit. The output of the DLL is coupled to an input node of the balanced clock tree. Each output node of the balanced clock tree is coupled to a corresponding one of the interface cells so that all of the interface cells output the clock signal in phase with one another. Each external device receives the clock signal from a corresponding one of the interface cells via a separate external connection. Each of these external connections has an equal propagation delay. One of the interface cells supplies the clock signal back to the reference signal input of the DLL via an external connection. This external connection has the same propagation delay as each of the external connections to the various external devices. Matching of the propagation delays of the various external connections may be accomplished by simply making the external connections all of the same length. Board level design is simplified because no balanced clock tree is needed to route the clock signal from the integrated circuit to the external devices. The interface cells used to supply the clock signal to the various external devices can be separated from one another by intervening interface cells so that the intervening interface cells can be used to communicate data between the integrated circuit and the external devices. This spacing of the interface cells allows clock lines and data lines to be extended to the external device without having the clock lines cross the data lines.
Other structures and methods are disclosed in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims.