1. Field of the Invention
This invention relates generally to a timing domain crossing method and apparatus and more particularly to a timing domain crossing method and apparatus for use in enhanced data rate systems such as double data rate systems.
2. Description of Related Art
In modern high frequency semiconductor and system design, timing margins for passing signals between devices are increasingly constrained by the higher clock frequencies and increased data bandwidth requirements. Historically, devices generated output signals with a propagation delay from a system input clock. As clock periods shrink, these delays from clock input to output signal consume larger and larger percentages of a clock period until, in some systems, the output signal delay becomes larger than the clock period. This situation presents difficulties in sampling the output signal at a receiving device on a definite clock cycle. As a result, many systems have turned to source synchronous clocking in attempts to increase timing margins and sampling windows at the data destination. In source synchronous clocking, the device generating the data also sends a clock signal along with the data signals to reduce time delay between clock and data. Additionally, by using similar output buffers for clock and data on the sending device, along with similar input buffers for clock and data on the receiving device, skew between clock and data may be reduced.
Some systems take advantage of this increased timing margin to send new data on both rising and falling edges of the clock rather than the more traditional rising edge only or falling edge only communication. This “double data rate” (DDR) communication increases data bandwidth while maintaining a lower clock frequency because two packets of data are sent for each clock period rather than the more traditional one data packet per clock period.
Source synchronous clocking and DDR communication are used alone or in combination in a variety of communication busses. DDR Dynamic Random Access Memory (DRAM) is an example of a system that uses both source synchronous clocking and DDR communication.
FIG. 1 illustrates a logic diagram of an implementation that may be used to sample DDR data at a receiver and present the data to other internal circuitry timed to an internal clock. In the FIG. 1 implementation, data 10 is sampled by falling edge flip-flop 20 clocked by a write strobe 12 and data 10 is also sampled by rising edge flip-flop 30 clocked by the write strobe 12. To re-time the data sampled by rising edge flip-flop 30, falling edge flip-flop 35, clocked by the write strobe 12, samples the output of the rising edge flip-flop 30. Many communication systems generate bursts of data. The burst size may be related to internal bus widths on devices. As a result, the FIG. 1 implementation also includes delay registers for portions of the sampled data to widen the burst data to from a burst of four serial data packets to a bus that is four data packets wide and all four data packets can be presented to subsequent circuitry on the same clock cycle. Delay register 40 samples the signal QPf to create a delayed data signal QPn. Similarly, delay register 50 samples the signal QN to create a delayed data signal QPn.
A timing generator 60 uses a clock signal 16 to sample a command signal 14, which may be a bus of signals, to determine whether data will be presented on the data bus in a write cycle. Conventionally, the command signal 14, indicating a write cycle on the data bus, may occur multiple clock cycles before data is presented on the data 10. The timing generator 60 determines this time delay to generate a timing strobe RTS when all four data packets are available at the appropriate active edge of the clock signal 16. Re-timing registers (72, 74, 76, and 78), clocked by the timing strobe RTS sample these four data packets for parallel presentation to subsequent circuitry (not shown).
FIG. 2 illustrates timing of various signals to illustrate operation of the FIG. 1 implementation. Referring to FIG. 1 and FIG. 2, a command is generated at the second clock cycle, which is followed by a burst of four data packets (R0, F0, R1, and F1) and a corresponding write strobe 12 to be used for sampling the data packets. Rising edge flip-flop 30 samples the R0 data packet on the first rising edge of the write strobe 12 and samples the R1 data packet on the second rising edge of the write strobe 12. The R0 and R1 data packets are subsequently transferred to data packets timed to the falling edge of write strobe 12 by falling edge flip-flop 35. Falling edge flip-flop 20 samples the F0 data packet on the second falling edge of the write strobe 12 and samples the F1 data packet on the third falling edge of write strobe 12. The delay register 40 and delay register 50 respectively sample the R0 and R1 data packets at the third falling edge of the write strobe 12. As a result, all four data packets are available in parallel after the third falling edge of the write strobe 12.
The timing generator 60 uses delays or counters to determine when all four data packets are available in parallel to generate a re-timing strobe RTS. In this case, the re-timing strobe RTS is generated at the rising edge of the sixth clock cycle. The re-timing strobe RTS is used by the re-timing registers (72, 74, 76, and 78) to synchronize the four data packets to the clock signal for presentation to other circuitry (not shown).
While the source synchronous timing and the double data rates may enable reliable high bandwidth, they may create problems at the receiving end if the data needs to be re-synchronized to the system clock. DDR DRAM timing may be used as an example to illustrate these problems. As shown in FIG. 3, the write strobe 12, which is generated from the system clock by a transmitting device, may have a delay relative to the system clock as small as tDQSSmin and as large as tDQSSmax, resulting in an uncertainty window tSQ. Depending on system implementation, device parameters, and operating conditions, the write strobe 12 may occur anywhere within the uncertainty window tSQ. Because the write strobe 12 has an uncertainty window, all the signals derived from flip-flops using the write strobe 12 contain this same uncertainty, as shown in FIG. 3 for the signals Data, QP, QPf, QN, QPn, and QNn.
This uncertainty results in a reduction in amount of time that the data packets are valid for sampling by the re-timing strobe RTS. This valid data window (tV) may be defined as the clock period (tCK) minus the uncertainty window (tSQ). In other words, tV=tCK−tSQ. For many DDR DRAM devices, tDQSSmax and tDQSSmin are expressed as percentages of a clock cycle. For example, plus or minus 25% may be used in many applications, resulting in an uncertainty window tSQ of 50% of a clock cycle. Therefore, the valid data window can be expressed as tV=tCK−0.5 tCK=0.5 tCK. For a 100 MHz clock, this means a generous 5 nSec, however, for a 1 GHz clock this leaves a valid data window of only 0.5 nSec.
In addition, other factors may reduce this timing margin even further. Signals within a device may exhibit jitter and skew related to variations in Process, Voltage, and Temperature (PVT) variations. These jitter/skew variations introduce an additional uncertainty tJS. Including the tJS uncertainty, the valid data window becomes: tV=tCK−tSQ−tJS. Obviously, tJS may be dependent on current process technology. With current process technologies, tJS may be several hundred pSec regardless of operating frequency, or sometimes may increase as operating frequency increases. Assuming 500 pSec for tJS, an uncertainty window tSQ of 50% of a clock cycle, and a 1000 pSec clock cycle, the valid data window becomes: tV=1000−50% (1000)−500=0, leaving no timing margin to reliably sample the data packets. It is clear that timing margin decreases rapidly as the clock period decreases, because there is not necessarily a corresponding decrease in the tJS uncertainty.
There is a need for an apparatus and method to sample source synchronous data and DDR data, which enables larger timing margins and reliability in transferring data from the source synchronous timing domain to an internal clock timing domain.