Computer systems generally include a memory subsystem that contains memory devices where instructions and data are held for use by a processor of the computer system. Because the processor is typically capable of operating at a higher rate than the memory subsystem, the operational speed of the memory subsystem has a significant impact on the performance of the computer system.
In the past, the memory devices making up the memory subsystem, such as Dynamic Random Access Memory (DRAM), were typically asynchronous devices, i.e. the memory devices stored or output data in response to control signals from the processor. However, asynchronous operation results in a delay between the time that a control signal, e.g. a read command and address value, is received by the memory device and the time that the device responds, e.g. the data becomes available at the output of the memory device. This delay between the reception of a control signal and the device response typically lasts for several operational cycles of the processor. During the delay, the processor is typically unable to perform useful functions and the operational cycles are consequently wasted.
To avoid wasting operational cycles while waiting for a response from memory, synchronous memory devices, such as synchronous DRAM (SDRAM), have been developed. SDRAM exploits the fact that most memory accesses are sequential and is designed to fetch data words in a burst as fast as possible. SDRAM typically operates by outputting a sequence, or “burst”, of several words or bytes of data in response to a single control signal from the processor. For example, a burst cycle, such as 5-1-1-1, consists of a sequence of four data word transfers where only the address of the first word is supplied via the address bus input to the memory device. The 5-1-1-1 refers to the number of clock cycles required for each word of the burst. In this example, the first word is available at the output of the data device at five clock cycles after the input cycle of the command signal and another word is output by the memory device at each subsequent clock cycle to complete the burst.
An SDRAM device typically employs a memory controller through which the processor accesses the DRAM memory cells. When the memory controller receives a data request from the processor, it accesses the rows and columns of the DRAM memory array to access the data and must wait for the data to become available from the DRAM memory array before sending it to the processor. With SDRAM a burst counter in the controller typically allows the column part of the memory address to be incremented very rapidly, which helps speed up retrieval of information in sequential reads considerably. The controller synchronizes the timing of the memory system to the processor's system clock in order to supply the data words to the processor as fast as the processor can take them. Note that for synchronous memory schemes to function properly, the data words from the DRAM cells must be available and valid at, typically, the rising edge of each clock cycle.
Another approach that has been developed to improve memory performance is called double data-rate (DDR), such as is available in DDR DRAM devices. In a DDR DRAM, data during a burst is output on both the rising and falling edge of the clock cycles, which effectively doubles the rate of operational frequency of the memory subsystem.
However, in DDR, a data word must be available and valid from the data cells of the memory at both the rising and falling edge of the clock signal driving the memory system. The effect of this is that the performance of the memory subsystem becomes very sensitive to the round-trip delay between the controller and memory.
FIG. 1 is a functional block diagram of a memory architecture 10 that illustrates an example of a DDR memory controller 20 according to the conventional art. Memory controller 20 contains a clock generation circuit 22 that generates a clock zero signal CLK0. The CLK0 drives an even clock domain zero register 24 and an odd clock domain zero register 26. The CLK0 signal is also output to a DDR DRAM block 90 and arrives at the clock input (CLK) of the DDR DRAM device after a propagation delay time interval tPD, as represented in FIG. 1 by block 92.
DRAM device 90, in turn, generates a data output signal at output DQ after a output to clock delay interval tDQCK, which experiences another propagation delay tPD represented by block 94 and which results in a delayed data signal DQ1 arriving at the memory controller 20. After a clock to output delay interval tDQSCK, DRAM device 90 also outputs a data output synchronize signal DQS that is also delayed by propagation delay interval tPD, as represented by block 96, and results in a delayed version of the DQS signal called DQS1 that is input to the controller 20.
The DQ1 and DQS1 signals are received by a DQS domain circuit 70 of the controller 20. The DQ1 signal is input to sample and hold registers 74 and 76. The DQS1 signal enters t1 delay circuit 72, which results in delayed signal DQS2. The rising edge of the DQS2 signal drives sample and hold register 74 and a falling edge of the DQS2 signal drives sample and hold register 76, which latch even and odd data words, respectively, of the DQ1 signal.
After a data valid time interval tv, sample and hold register 74 generates data signal DQ2 which is input to even clock zero domain register 24, which is clocked on a rising edge of the CLK0 signal generated by clock generation circuit 22. Also, after data valid interval tv, sample and hold register 76 outputs a delay data signal DQ3 to odd clock zero domain register 26 which is clocked on the falling edge of the clock zero signal.
In the conventional device shown in FIG. 1, the data from the DQ output of the DDR DRAM device 90 must typically be available and valid at the input of the even clock zero domain register 24 within a single clock cycle interval tCC in order for the memory controller to make the data available at the appropriate time the processor to read the data word.
FIG. 2 is a timing diagram illustrating an example of the function of controller FIG. 1 and illustrating the effect of the delay in the circuit in FIG. 1 on the setup time tS for the even and odd clock zero domain register 24 and 26. Measured from a rising edge of the CLK0 signal generated by clock zero register circuit 22, a first propagation delay interval tPD, represented in FIG. 1 as delay 92, is received in DRAM device 90 at clock input CLK. From the time that the delayed CLK0 signal is received at the CLK input of DRAM device 90 to the time that the DQS signal is output involves a delay tDQSCK. The DQS signal is then delayed by another propagation delay interval, represented in FIG. 1 as delay 96, that results in the DQS1 signal that is received by DQS circuit 70. The DQS1 signal, in turn, is delayed by time interval t1 by delay element 72, which results in the DQS2 signal. The delay element 96 introduces delay t1 so that the DQ1 signal meets the set-up time requirements for registers 74 and 76.
The set-up time for registers 74 and 76 can be derived from the formulatSmin<=t1min+(tDQSCKmin−tDQCKmax)
which, inserting typical values, produces 0.2 ns<=t1min−0.5 ns, which, in turn, yields, 0.7 ns<=t1min. The hold time for registers 74 and 76 can be derived from the formulatHmin<=tCHmin+(tDQCKmin−tDQSCKmax)−t1max                 where tCHmin is the minimum clock high cycle time, which is typically one third of the clock cycle tcc. Inserting typical values, this formula produces 0.2 ns<=2.5 ns−0.5 ns−t1max, which, in turn, yields 1.8 ns<=t1max.        
From the rising edge of the DQS2 signal to the time that the data signal DQ2 is valid at the output of sample and hold register 74 of FIG. 1, is represented by the delay tv. Subtracting the sequence of time delays from the total available time for a single clock cycle period tcc for setup of the even and odd data output of controller circuit 20, the maximum round trip propagation delay time that can be tolerated for the even and odd clock zero domain register 24 and 26 can be obtained and is shown in the following equation (1).tS,min<=tCC,min−tPD,max−tDQSCK,max−tPD,max−t1,max−tV,max  (1)
By plugging in typical numbers for a clock cycle period of 7.5 nanoseconds (ns) yields:0.2 ns<=7.5 ns−tPD,max−0.75 ns−tPD,max−1.8 ns−0.25 ns
andtPD,max<=1.75 ns.
The controller circuit 20 of FIG. 1 also has limitations on the minimal propagation delay due to the minimum hold time required by the even and odd clock zero domain registers 24 and 26. Equation (2) below illustrates the time requirements introduced by the hold time required in order to latch even and odd words of the DQ signal in registers 74 and 76.tH,min<=tPD,min+tDQSCK,min+tPD,min+t1,min+tV,min  (2)
Plugging in typical values for these time intervals yields:0.2 ns<=tPD,min+0.75 ns+tPD,min+0.7 ns+0 ns
Which reduces to:0.25 ns<=tPD,min 
Thus, the propagation delay must be in the range of 0.25 ns<=tPD<=1.75 ns in order for the memory system to operate correctly. As the size of memory cores, such as that in DRAM device 90, become larger and, therefore, require longer access times, and as clock frequencies become faster, resulting in shorter clock cycles and, therefore, less time available for set-up, this constraint can become a significant problem for memory system design.
Therefore, the need remains for improved ways for handling propagation delay in high performance memory systems.