Double Data Rate, or “DDR” memories are extremely popular due to their performance and density, however they present challenges to designers. In order to reduce the amount of real estate on the memory chips, much of the burden of controlling the devices has been offloaded to circuits known as DDR memory controllers. These controller circuits may reside on Processor, ASSP, or ASIC semiconductor devices, or alternately may reside on semiconductor devices dedicated solely to the purpose of controlling DDR memories. Given the high clock rates and fast edge speeds utilized in today's systems, timing considerations become challenging and it is often the case that timing skews vary greatly from one system implementation to another, especially for systems with larger amounts of memory and a greater overall width of the memory bus.
In general, the industry has responded by moving towards memory controllers that attempt to calibrate themselves during a power-on initialization sequence in order to adapt to a given system implementation. Such an approach has been supported by the DDR3 standard where a special register called a “Multi-Purpose Register” is included on the DDR3 memories in order for test data to be written prior to the calibration test performed during power-on initialization. The circuitry on memory controllers typically used for receiving data from DDR memories normally incorporates features into the Phy portion (Physical interface) of the memory controller circuit where the controller can adapt to system timing irregularities, this adaptation sometimes being calibrated during a power-on initialization test sequence.
FIG. 1 Shows a typical prior art DDR memory controller where an Asynchronous FIFO 101 is utilized to move data from the clocking domain of the Phy 102 to the Core clock domain 103. Incoming read data dq0 is clocked into input registers 105 and 106, each of these input registers being clocked on the opposite phase of a delayed version of the dqs clock 107, this delay having been performed by delay element 108. Asynchronous FIFO 101 typically consists of at least eight stages of flip-flops requiring at least 16 flip-flops in total per dq data bit. Notice also that an additional circuit 109 for delay and gating of dqs has been added prior to driving the Write Clock input of FIFO 101. This is due to the potential that exists for glitches on dqs. Both data and control signals on a typical DDR memory bus are actually bidirectional. As such, dqs may float at times during the transition between writes and reads, and as such be susceptible to glitches during those time periods. For this reason, typical prior art in DDR controller designs utilizing asynchronous FIFOs add gating element 109 to reduce the propensity for errors due to glitches on dqs. After passing through the entire asynchronous FIFO 101, read data is transferred to the core domain according to Core_Clk 110. Additional circuitry is typically added to FIFO 101 in order to deal with timing issues relative to potential metastable conditions given the unpredictable relationship between Core_Clk and dqs.
FIG. 2 shows another prior art circuit for implementing a DDR memory controller, in particular a style utilized by the FPGA manufacturer Altera Corp. Portions of two byte lanes are shown in FIG. 2, the first byte lane represented by data bit dq0 201 and corresponding dqs strobe 202. The second byte lane is represented by dqs strobe 203 and data bit dq0 204. In general, the data and strobe signals connecting between a DDR memory and a DDR memory controller are organized such that each byte or eight bits of data has its own dqs strobe signal. Each of these groupings is referred to as a byte lane.
Looking at the data path starting with dq data bit 201 and dqs strobe 202, these pass through programmable delay elements 205 and 206 respectively before being stored in capture registers 207 and 208. Eventually these signals pass through a series of registers 209, 210, and 211 which are clocked by signals coming from tapped delay line 213. These registers form what is called a levelization FIFO and attempt to align the data bits within a byte lane relative to other byte lanes. Tapped delay line 213 is driven by a PLL re-synchronization clock generator 214 which also drives the final stage registers 212 of the levelization FIFO as well as being made available to the core circuitry of the controller. The PLL resynchronization clock generator 214 is phase and frequency synchronized with dqs. Notice that at this point, data stored in final stage registers 212 has not yet been captured by the core clock of the memory controller. Also notice that the circuit of FIG. 2 utilizes an individual delay element for each data bit such as dq0 201 and dq0 204.
When we examine fully-populated byte lanes, it should be noted that the additional delay elements required to provide an individual programmable delay on all incoming data bits can consume a large amount of silicon real estate on the device containing a DDR memory controller circuit. Such a situation is shown in FIG. 3 where a single dqs strobe 301 requires a single programmable delay 302, while the eight data bits 303 of the byte lane each drive a programmable delay element 304.
FIG. 4 describes some of the timing relationships that occur for a prior art DDR memory controller which uses delay elements within the Phy for individual read data bits.
FIG. 4a shows a simplified diagram where a single data bit is programmably delayed by element 401 in addition to the dqs strobe being delayed by element 402. Typically data from input dq is captured on both the rising and falling edges of dqs as shown in FIGS. 1 and 2, however for the sake of simplicity, the diagrams of FIGS. 3-12 only show the schematic and timing for the dq bits captured on the rising edge of dqs. By controlling both of these two delays, the output of capture register 403 can be delayed by any amount within the range of the delay elements before it is passed into the core clock domain and clocked into register 404 by the Core_Clk signal 405. In FIG. 4b, the dqs_delayed signal 406 is placed near the center of the valid window for dq 407 and after being captured in register 403, data then enters the core domain at clock edge 408 is shown as shown. In this scenario the latency to move the data into the core domain is relatively low simply because of the natural relationship between core clock and dqs. This relationship however is extremely dependent upon the system topology and delays, and in fact could have almost any phase relationship.
A different phase relationship is possible as shown in FIG. 4c. Here, a first edge 409 of Core_Clk happens to occur just before the leading edge 410 of dqs_delayed. The result is that each data bit will not be captured in the core clock domain until leading edge 411 of Core_Clk as shown, and thus will be delayed by amount of time 412 before being transferred into the core domain. Thus, while the ability to delay both dq and dqs can accomplish synchronization with the core clock, it may introduce a significant amount of latency in the process.
A DDR memory controller circuit and method is therefore needed that reliably captures and processes memory data during read cycles while requiring a small gate count resulting in implementations requiring a small amount of silicon real estate. The controller should also offer a high yield for memory controller devices as well as a high yield for memory system implementations using those controller devices.