Computer systems generally include a memory subsystem that contains memory devices where instructions and data are held for use by a processor of the computer system. Because the processor is typically capable of operating at a higher rate than the memory subsystem, the operational speed of the memory subsystem has a significant impact on the performance of the computer system.
In the past, the memory devices making up the memory subsystem, such as Dynamic Random Access Memory (“DRAM”) devices, were typically asynchronous devices, i.e., the memory devices stored or output data in response to control signals from a processor. However, asynchronous operation results in a delay between the time that a control signal, e.g., a read command and address value, is received by the memory device and the time that the device responds, e.g., the data becomes available at the output of the memory device. This delay between the reception of a control signal and the device's response typically lasts for several operational cycles of the processor and, during the delay, the processor is typically unable to perform useful functions, and the operational cycles are consequently wasted.
To avoid wasting operational cycles while waiting for a response from memory, synchronous memory devices, such as synchronous DRAM (“SDRAM”) devices, have been developed. SDRAM devices exploit the fact that most memory accesses are sequential and are designed to fetch data words in a burst as fast as possible. SDRAM devices typically operate by outputting a sequence, or “burst”, of several words or bytes of data in response to a single control signal from the processor. For example, a burst cycle, such as 5-1-1-1, consists of a sequence of four data word transfers where only the address of the first word is supplied via the address bus input to the memory device. The 5-1-1-1 refers to the number of clock cycles required for each word of the burst. In this example, the first word is available at the output of the SDRAM device at five clock cycles after the input cycle of the command signal, and another word is output by the memory device at each subsequent rising edge of the clock to complete the burst.
Another approach that has been developed to improve memory performance is called double data-rate (“DDR”) and is used in DDR DRAM memory devices. In a DDR DRAM device, data during a burst is output on both the rising and falling edge of the clock cycles, which effectively doubles the rate of operational bandwidth of the memory subsystem.
FIG. 1 is a functional block diagram of a memory architecture 10 that illustrates an example of a DDR memory controller 20 according to the conventional art. Memory controller 20 contains a clock generation circuit 22 that generates a clock signal CLK0. The CLK0 drives even clock domain registers 24 and 25, and odd domain registers 26 and 27. The CLK0 signal is also coupled to a DDR DRAM block 90 and arrives at the clock input (“CLK”) of the DDR DRAM device 90 after a propagation delay time interval tPD, as represented in FIG. 1 by block 92.
DDR DRAM device 90, in turn, generates a data output signal at output DQ after an output to clock delay interval tDQC, which experiences another propagation delay tPD represented by block 94, and which results in a delayed data signal DQ1 arriving at the memory controller 20. After a clock to output delay interval tDQCK, DDR DRAM device 90 also outputs a data output synchronize signal DQS, also known as a data strobe signal, that is also delayed by propagation delay time tPD, as represented by block 96, and results in a delayed version of the DQS signal called DQS1 that is received by the controller 20. It should be noted that the propagation delays represented by blocks 92, 94 and 96 are not necessarily equal.
The DQ1 and DQS1 signals are received by a DQS domain circuit 70 of the controller 20. The DQ1 signal is input to edge-triggered data samplers 74 and 76. The DQS1 signal enters t1 delay circuit 72, which results in a delayed signal DQS2. The rising edge of the DQS2 signal activates the sampling capability of the edge-triggered data sampler 74, and a falling edge of the DQS2 signal activates the sampling capability of the edge-triggered data sampler 76, which latches even and odd data words, respectively, of the DQ1 signal.
After a data valid time interval tV, edge-triggered data sampler 74 generates data signal DQ2 which is input to even clock domain register 24, which is then clocked on a rising edge of the CLK0 signal generated by clock generation circuit 22. Also, after data valid interval tv, edge-triggered data sampler 76 outputs a delay data signal DQ3 to odd clock domain register 26, which is clocked on the falling edge of the CLK0 signal.
DDR memory controller 20 illustrated in FIG. 1 works well in low-speed memory systems operating at rates of 133 MHz or slower. In high-speed source synchronous memory systems, i.e., systems that use a strobe or clock signal generated by the address/data signal source to latch or clock the address/data signal at the receiving agent, data being read from specific memory device and any accompanying DQS signals arrives at a memory controller with some timing offset, relative to a system clock. These small timing offsets become performance limiting factors in high-speed memory systems.
Memory controllers therefore commonly employ delay-locked loops (“DLLs”) to correct phase offsets in DRAM devices or point-to-point DDR memory systems since such systems have well-managed (fixed or known) timing offsets. However, as is known in the art, DDR memory devices are also used in a multi-rank memory system, in which one or more memory devices share the same memory channel. During a read operation of the conventional device shown in FIG. 1, a DDR memory device transmits N data (DQ) signals (typically 8 or 32) back to the controller 20, accompanied by a strobe (DQS) signal, which is edge-aligned with the data signal. Although these DQ/DQS signals emerge from a DDR DRAM device in close proximity to an incoming global clock (CLK), by the time they reach the controller, they arrive in some arbitrary phase relationship relative to the CLK. This arbitrary phase relationship creates problems in a multi-rank DDR system. In a multi-rank system, during a read operation, the DQ/DQS signals arrive at the controller at different phases relative to the CLK, depending on which device in the rank transmitted the data. Therefore, it is very difficult to use a single clock signal, e.g., a clock derived from the CLK by a DLL, to sample all data regardless of the data's origin.
Therefore, it is desirable to provide an improved circuit topology and method for data acquisition in multi-rank memory systems.