There is a constant impetus in the electronics sector to provide electronic devices with more functionality at greater speed. Greater functionality generally requires increased complexity in the form of an increased number of signals and subsystems. For an electronic system to operate, all of these myriad subsystems must be functioning in synchronicity to accomplish the system's overall objectives. The required accuracy of this synchronization increases proportionally with the speed of operation, and current consumer-available computer processing units operate using clocks that have cycle times as small as 0.3 nanoseconds. Finding methods and circuits for aligning the multitude of electronic signals flowing between ever growing numbers of electronic subsystems with increasing accuracy is a formidable challenge for the electronics industry.
In modern electronic systems, the clock signal is generally the fastest and most widely distributed signal in the system. Full distribution of the clock is necessary because it provides information about what the entire system is doing so that each subsystem can execute their assigned task at the right time. Therefore, the arrival of the clock at each subsystem must be carefully synchronized because skews between the clocks arrival will limit the maximum operational speed of the system, and may cause race conditions where one subsystem is no longer acting in tandem with the rest of the system.
A simple electronic clock signal will consist of a periodic signal transitioning between low and high states on a clock edge twice every clock period. Circuits can be triggered to execute tasks on the rising or falling edges of the clock, and double data-rate (DDR) circuits can execute tasks on both of these edges. Regardless of which portion of the clock is used to trigger actions by the circuit, it is important that the trigger portion of the clock signal be delivered in such a way that all subsystems receive it simultaneously.
The problem of routing a clock signal through a circuit such that all subsystems are acting in synchronization can be solved through the use of special physical layout approaches such as a t-branch topology. In a t-branch topology, each branch of the clock is split into two sub-branches until there are as many sub-branches as subsystems requiring a clock signal. The physical layout is done such that every signal needs to travel through the same number of branch splits and hence the same overall distance. Thereby, the clock signal reaches every subsystem at the same time. The problem with this type of approach is that each t-branch degrades the integrity of the clock signal and creates a corresponding increase in clock skew. This problem can only be remedied through the use of expensive termination circuits.
The problem of multiple terminations and the resultant degradation of the clock signal is well known in the art and is often alleviated through the use of fly-by topology. Fly-by topology is a layout pattern that relies on sending signals that are required by multiple subsystems on a single path. Since this topology allows for higher signal integrity without the use of expensive termination circuits, it is the preferred topology in high speed applications. For example, when the Joint Electron Device Engineering Council (JEDEC) released its double data rate three (DDR3) memory system specification as an improvement over double data rate two (DDR2), the method for sending the system clock from the controller to the memory modules was changed from a branching to a fly-by approach. This is because DDR3 is meant for higher frequency operation, and fly-by preserves signal integrity at the level required for high frequency operation.
Although the fly-by topology approach eliminates problems caused by signal line branching, another problem is created. In FIG. 1, control circuit 100 sends a first signal required by both subsystems 101 and 102 along fly-by configured line 103. Control circuit 100 also sends signals individually to subsystems 101 and 102 along directly connected lines 111 and 112. The problem at issue is caused by the fact that the distance from control circuit 100 to subsystem 101 along line 103 is much shorter that the distance from control circuit 100 to subsystem 102 along line 103, whereas the length of data line 111 and 112 are nearly the same. Therefore, a problem arises because if the signals sent along lines 111, 112, and 103 are aligned at the control circuit 100, the arrival of the fly-by configured signal and the direct coupled signals at operative areas 121 and 122 will not be synchronized.
In the specific context of a DDR3 memory system, the fly-by topology problem manifests itself through the relationship of the system clock and the DDR3 data strobe signals (DQS). During a write operation, individual bytes in a DDR3 memory module need to receive a DQS signal aligned with the data it is receiving on its data bus. In addition, each memory module needs to receive a clock signal that is synchronized with the command/address bus through a fly-by topology. If the clock and DQS signals are not aligned, data could be written to the wrong address. In a DDR3 system, the memory controller would be control circuit 100, an individual byte on the DDR3 memory module could be represented by subsystem 101, the clock signal line could be represented by line 103, and the DQS signal line for byte 101 could be represented by line 111. In order to assure a race condition does not occur where desired data is written to the wrong address, the clock and each individual DQS signal must be aligned at the memory module boundary which is represented in FIG. 1 by operative areas 121 and 122.
The process of aligning the DQS and system clock signals is called write leveling. The industry standard approach to write leveling in DDR3 is described in JEDEC's JESD79 specification. Under the JEDEC approach, the clock is aligned with the data strobe signal by adding a certain amount of delay to each of the individual data strobe signals before they are sent out by the memory controller. These delays are individually calibrated so that the data signals arrive at the operative point of each subsystem at the same time as the clock. The JEDEC write leveling approach can be best described with reference to FIG. 2 and FIG. 3.
FIG. 2 shows a single subsystem 210 and the accompanying portion of the control circuit 200 for calibrating an individual signal with the clock in a manner consistent with the JEDEC approach. In FIG. 2, subsystem control circuit 200 takes in the target signal on node 201, and contains variable delay element 202. The subsystem control circuit 200 is connected to subsystem 210 by three interconnects 220, 221, and 222. The relative lengths of interconnects 220 and 221 are representative of the increased distance associated with the signal paths, and the commensurate increase in time it takes for a signal to propagate through interconnect 220 as compared to interconnect 221. Subsystem edge 212 sufficiently defines the operative node of subsystem 210 because signal propagation times within the subsystem are negligible. Sampling flop 211 samples the target signal on node VTR while being clocked by the delayed target signal on node VDTR, and outputs a value on node VFB. Sampling flop 211 has the basic characteristics of a simple DQ flop such that on the rising edge of a pulse sent to its clock input, it will output the value on its input node VTR on its output node VFB. Variable delay element 202 applies a delay to the signal received from node 201 to produce a delayed signal on node VDOUT.
FIG. 3 displays a timing diagram that illustrates the calibration algorithm applied to the system in FIG. 2 using the JEDEC approach. The voltage signal on node VDTR is shown on axis 301, the voltage signal on node VTR is shown on axis 302, and the voltage on node VFB is shown on axis 303. The x-axis of all three axes is in units of time and they are all aligned with y-intercepts of time equal to zero. The double hash marks on each of the x-axes indicate a break in the uniform scale of the time value. The hash marks obscure a full system period, wherein another signal is prepared and output by variable delay element 202. The system period hash marks also divide the figure into three portions. Portions 304, 305, and 306 each display the voltages as they behave in a first, second, and third consecutive system periods respectively.
Before any delay is added by variable delay element 202, the target signal 201 (e.g., a system clock signal) will pass through the delay element 202 unhindered and arrive at node VDTR before the target signal (system clock signal) arrives at node VTR. This is shown through comparison of the first rising edge on axes 301 and 302 respectively in portion 304. These first pulses are not aligned in the sense that they have a non-zero phase difference. Having a non-zero phase difference means that although they have the same period and pulse characteristic, their re-initializations do not occur at the same point in time. Since the signal sampled on node VTR has not yet been re-initialized, when the rising edge of the pulse on node VDTR triggers a sampling of the voltage on node VTR, a low value is sampled. Therefore, the voltage on node VFB on axis 303 remains low.
In the next phase of the JEDEC approach, variable delay element 202 will begin to incrementally increase the delay applied to the target signal received from node 201. Portion 305 shows the changes to the relevant signals after the delay element has been incremented. A single step of this process can be seen with reference to the pulses in portion 305 on axis 301. The second of these two pulses is the target signal arriving at node VDTR after the variable delay produced by variable delay element 202 has been incremented one step. The first pulse is shown using a dotted line and is not actually present in the system at that time. The pulse is represented only to show when the delayed target signal would have arrived if no change to the delay element had been made. The difference between the two pulses measured in time is marked on axis 301 with the indication tstep. Although the rising edges of the solid line pulse of axis 301 and the pulse of axis 302 are closer in portion 305 than they are in portion 304, the voltage on node VTR is still low on the rising edge of the pulse at node VDTR so the voltage on node VFB on axis 303 remains low.
The waveforms in portion 306 of FIG. 3 illustrate the system reaching a lock condition. In portion 306, the pulse arriving at node VDTR has been delayed by the variable delay element 202 by a phase distance of two delay steps. The dotted line pulses in portion 306 on axis 301 are displayed purely to show when the signal would have arrived if delay element 202 had applied either one step delay, or zero delay. In portion 306 of the axes, the signal on node VDTR (axis 301) is high when a rising edge of a pulse is received at node VTR as shown on axis 302 by reference line 307. As such, the sampling flop will output a signal that transitions from low to high on node VFB which is illustrated in portion 306 on axis 303. Once the pulse signal on node VFB (axis 303) is received by delay element 202, the control circuit 200 will fix the delay applied and the calibration according to the JEDEC approach will be complete.
There is a great need for signal alignment systems in complex high speed electronic systems. When circumstances require that the signals cannot be sent along paths of equal length, the possibility that the signals will be skewed upon arrival at their operative points increases dramatically. The approach taken by JEDEC to ameliorate this problem in the DDR3 specification is to add a delay to the faster signal at the point the signals take divergent paths.