As computers and computer processors increase in performance, memory access performance becomes a significant factor affecting overall system performance. If an interface that communicates data between a memory device and a memory controller or other application device operates slower than a processor can use data, the interface can reduce the data processing capacity of the entire computer.
For dynamic random access memory (DRAM) devices, which are commonly used as the main working memory for a computer, various interconnect technologies have been developed over the years. One such interconnect technology is used for synchronous DRAMs, or SDRAMs, which utilize a source synchronous interface, where the source of data during a memory transfer is relied upon to provide a clock signal, often referred to as a data strobe signal (DQS), that is used by a target for the data to capture such data as it is being transferred over a data line to the target. In particular, the capture of data on a data line is typically latched by the rising or falling edge of the DQS signals, for example, so that the value transmitted on a data line when the data strobe signal transitions from low to high, or visa-versa, will be latched into a data latch in the target.
Double data rate (DDR) memory elements contain multiple buses. A command and address bus is formed by a number of signals, such as, for example, a column-address strobe (CAS), row-address strobe (RAS), write enable (WE), clock enable (CKE), chip-select (CS), address (ADDR), bank address (BA) signals, and differential clock signals (CK and CKn). The data bus contains the data signals (DQ), data mask (DM) and the source synchronous strobes (DQS and DQSN). DDR3 memory elements operate with differential strobes DQS and DQSN, which enable source-synchronous data capture at twice the clock frequency. Data is registered with the rising edge of DQS and DQSN signals.
DDR3 data is transferred in bursts for both read and write operations, sending or receiving 2, 4 or 8 data words with each memory access. For read operations, data bursts of various lengths are transmitted by the DRAM device edge aligned with a strobe. For write operations, data bursts of various lengths are received by the DRAM element with a 90 degree phase-delayed strobe. The strobe signal is a bidirectional signal used to capture data. After the data is captured in the source-synchronous strobe domain, the data must be transferred into a local clock domain.
For dual in-line memory modules or DIMMs, the DDR3 memory specification includes what is commonly known as a “flyby” topology for clock, address and control connections that are shared between all of the DRAM devices on the DIMM. As opposed to the balanced tree arrangement used in DDR2 memory, which provides clock, address and control signals of approximately the same length to each memory element in a memory module at the expense of signal integrity, the flyby topology is arranged to promote signal integrity and requires clock, address and control connections of different lengths for each device within the module. Since the DDR3 SDRAM devices require a specific timing relationship between the data, strobe and clock, the DDR3 specification supports an independent timing calibration for each source synchronous group. During a write-leveling calibration, a free running clock is propagated from the ASIC or host to the data input of a dedicated internal calibration register in a target DRAM. A strobe clock signal pulse is propagated to the clock input of the dedicated internal calibration register. During the calibration process (commonly known as “write leveling”), the output of the dedicated internal calibration register indicates the phase alignment between the free-running clock and the source-synchronous strobe at the DRAM. This output is propagated back to the host on one or more of the data signal lines to the memory controller. The calibration is repeated with different delay values until an optimum write-leveling delay is determined. The host uses the optimized write-leveling delay information to control data transfers delays to the target.
Write operations to a DDR3 memory present some challenges for an interface designer attempting to create an interface with low-latency while maintaining signal integrity. Since the relative timing relationship between the ASIC or core clock and the DDR3 memory device is determined during the write-leveling calibration procedure, the interface designer should be prepared for a write-leveling delay range that can vary from no delay whatsoever to a full clock period. Output data from the memory controller is typically transferred synchronously in the core clock domain, but must be transferred to the delayed or leveled domain for synchronous output to the DRAM device. A first-in, first-out (FIFO) circuit exists to reliably handle the domain transfer, but such a circuit may not be optimal from the standpoint of complexity, area, and latency.
During a read operation, a host or receiving device issues a read command and communicates a clock signal to the source DRAM. After a DRAM internal delay, the DRAM returns a data signal and strobe clock signal to the host. The host uses the strobe clock signal to capture the data signal. The data signal is captured in the source-synchronous strobe domain and must be transferred into the local clock domain. The DRAM transmits a preamble on the strobe signals at the beginning of each read data burst. The preamble places the positive-true and negative-true strobe signals in a differential state to ensure that the differential strobe receiver outputs are in a valid state in preparation for the first strobe edge. The host may use the preamble period as a window in which to gate, or “unpark” the strobe receiver outputs.
However, before a preamble arrives at the strobe pins, the differential inputs on the strobe receivers are driven to a termination voltage or VTT, which is ½ of VDD for DDR3 signals. When both inputs of a differential receiver are driven to the same voltage level, the output of the differential receiver depends on the input offset voltage, which typically is determined by random device mismatch and is thus undefined, or could randomly toggle because of noise on the bus. Because the outputs of the differential strobe receivers are used to clock registers, counters and other logic elements, the differential strobe receiver outputs must be gated until the data stream is received at the strobe pad inputs. Thus, the input strobe signals must be reliably gated or switched to ensure proper operation of logic elements on the strobe pad and the data pads.
If the strobe preamble is used a window in which to unpark the differential strobe receiver outputs, a programmable delay, operating in the core clock domain, may be used to align the parking control signal with the strobe preamble. A read gate training operation can be applied by sending a read command with a particular park control delay and sampling the incoming strobe signal(s) with a register clocked by the parking control signal. If the strobe signal is sampled during the preamble, the sampling register will store a logic “0”. If the strobe signal is sampled after the first strobe edge, the sampling register will contain a logic “1”. The procedure is repeated with various delays, and the optimum delay is computed to place the parking control in the center of the preamble. The strobe preamble is ideally a full clock cycle long, but the gating window may be substantially shorter due to signal propagation times through the differential receivers, logic gates, and strobe distribution paths. As long as the relative timing between the DRAM and the host element remain within the width of the preamble window, data read operations can be performed reliably. U.S. Pat. No. 7,170,818 illustrates and describes a circuit that samples and forwards incoming strobe signals in accordance with one of eight phase-separated clock signals to reliably gate the incoming strobes. The eight phase-separated clock signals are separated in equal intervals of 45 degrees to cover a full clock cycle. Once the training operation is complete and the programmable timing is set, significant drift in the timing relationship can cause a synchronization failure between a host and a DRAM. Timing variation or drift due to temperature and voltage changes that are tolerable at lower data rates may become catastrophic at increased data rates. The above-described prior art does not provide a solution that can address timing variation and drift between clock domains over time. Under these varying conditions, i.e., when uncertainty or variation in the round trip timing delay approaches one-half of a clock period, the circuit described in the prior art would not be an adequate solution for reliably communicating with a DDR memory element.