The present invention relates to digital storage devices, and more specifically, to dynamic random access memory.
Improvements in fabrication technology have resulted in dynamic random access memories (DRAMs) with increased density, faster performance, and higher operating frequencies. Because overall memory bandwidth requirements are rising and the number of DRAMs in a system is falling, the ability to quickly transport data to and from each DRAM has become increasingly important. ASYNCHRONOUS DRAMS
In conventional memory systems, the communication between a memory controller and DRAMs is performed through asynchronous communications. For example, the memory controller uses control signals to indicate to the DRAM when requests for data transactions are sent. The data transfers themselves are also performed asynchronously. To meet increased speed requirements, various enhanced asynchronous memory systems have been developed. One such system is the Extended Data Out (EDO) DRAM memory system.
FIG. 1 is a block diagram illustrating a typical EDO DRAM system 100. In the EDO DRAM system 100, data transfers are performed asynchronously in response to control signals and addresses sent from pin buffers 116 of a memory controller to pin buffers 118 of the EDO DRAM over a plurality of lines 120, 122, 124, 134 and 136. Specifically, lines 122 carry an address that is stored in latches 112 and 114. Line 120 carries a row address strobe ({overscore (RAS)}) that controls when the address stored in latch 112 is sent to row decoder 106. Line 134 carries an output enable signal that controls data output of the DRAM. Line 136 carries a write enable signal that controls timing chains 108 and the direction of data flow on the bi-directional data bus 126.
Upon receiving an address, row decoder 106 loads data that corresponds to the address from a memory array 110 in memory core 102 into a sense amplifier array 130. Line 124 carries a column address strobe ({overscore (CAS)}) that controls when the address stored in latch 114 is sent to column decoder 104. For a read operation, the column decoder 104 causes the data that is stored in the columns of the sense amplifier array 130 that correspond to the address received by column decoder 104 to be transferred through column I/O circuits 132. The data passes through the column I/O circuits 132 to the memory controller over a data bus 126.
Alternately, an EDO DRAM may use address transition detect circuitry to initiate the retrieval of data from the memory core, rather than the {overscore (CAS)} signal. Address transition detect circuitry is circuitry that monitors the address bus to detect transitions in the data that is being sent on the address bus. When a transition is detected, the EDO DRAM restarts the timing chains causing data corresponding to a new address to fall out of the column I/O circuits 132.
The communication between the EDO DRAM and the memory controller is asynchronous. Thus, the EDO DRAM is not driven by an external clock. Rather, timing chains 108 that are activated by the {overscore (RAS)} and {overscore (CAS)} control signals are used to control the timing of the data transfer. Because the core 102 is not driven unless activated by the {overscore (RAS)} and {overscore (CAS)} control signals, the core 102 does not consume energy unless a data transfer operation is taking place. Therefore, the EDO DRAM consumes less power than alternative architectures in which the interface is clocked even when no memory operation is being performed.
FIG. 2 is a timing diagram for a read operation in EDO system 100. At time TO the memory controller places on lines 122 an address that indicates the bank and row from which data is to be read. At time T1 the {overscore (RAS)} signal goes LOW causing the address to be sent from latch 112 to row decoder 106. In response, row decoder 106 causes the appropriate row of data to be transferred from memory array 110 to sense amplifier array 130.
At time T2 the memory controller places on lines 122 the address of the column from which data is to be read. At time T3 the {overscore (CAS)} signal goes LOW causing the address to be sent from latch 114 to column decoder 104. In response, column decoder 104 sends through column I/O circuits 132 data from the selected column of the row stored in sense amplifier array 130. Assuming that {overscore (WE)} is HIGH and {overscore (OE)} is LOW, the data will appear on data bus 126. The data on the data bus 126 takes some time to stabilize. To ensure an accurate reading, the memory controller does not read the data from the data bus until time T4.
The delay between the time at which the {overscore (RAS)} signal goes LOW to initiate a read operation and the time at which the data may be read from the data bus 126 is identified as tRAC. The delay between the time at which the {overscore (CAS)} signal goes LOW for a read operation and the time at which the data may be read from the data bus 126 is identified as tCAC. The delay between the time at which the column address is placed on the address bus and the time at which the data may be read from the data bus 126 is identified as tCAA. In a typical EDO DRAM, exemplary times are tCAC=15 ns and tCAA=30 ns.
In one variation, the memory controller is allowed to have column address flow through. The memory controller therefore has until T3 (the fall of {overscore (CAS)} ), rather than until T2 (the transmission of the column address), to decide whether to perform a given transaction. In the exemplary times above, the memory controller would have 15 ns more time to decide whether to perform a given transaction.
DRAMs built with an asynchronous RAS/CAS interface have difficulty meeting the high memory bandwidth demands of many current computer systems. As a result, synchronous interface standards have been proposed. These alternative interface standards include Synchronous DRAMs (SDRAMs). In contrast to the asynchronous interface of EDO DRAMS, SDRAM systems use a clock to synchronize the communication between the memory controller and the SDRAMs. Timing communication with a clock allows data to be placed on the DRAM output with more precise timing. In addition, the clock signal can be used for internal pipelining. These characteristics of synchronous communication results in higher possible transfer rates.
FIG. 3 is a block diagram illustrating a conventional SDRAM system 300. In system 300, the memory controller includes a plurality of clocked buffers 304 and the SDRAM includes a plurality of clocked buffers 306. Data from control line 310 and an address bus 312 are received by a finite state machine 308 in the SDRAM. The output of the finite state machine 308 and the address data are sent to memory array 302 to initiate a data transfer operation.
FIG. 4 is a timing diagram that illustrates the signals generated in system 300 during a read operation. At time T0 the memory controller places a read request on line 310 and an address on bus 312. At time T1 the SDRAM reads the information on lines 310 and 312. Between T1 and 12 the SDRAM retrieves the data located at the specified address from memory array 302. At time T2 the SDRAM places data from the specified address on data bus 314. At time T3 the memory controller reads the data off the data bus 314.
Because system 300 is synchronous, various issues arise that do not arise in asynchronous systems. Specifically, the synchronous system has numerous pipeline stages. Unbalanced pipeline stages waste computational time. For example, if a shorter pipeline stage is fed by a longer pipeline stage, there will be some period of time in which the shorter pipeline stage remains idle after finishing its operation and before receiving the next set of data from the preceding pipeline stage. Similarly, if a short pipeline stage feeds a longer pipeline stage, the shorter pipeline stage must wait until the longer pipeline stage has completed before feeding the longer pipeline stage with new input.
Each stage in the pipeline must allow for the setup, clock transition, and clock-to-out-put time of the flip-flop that is dividing the stages. Typically the execution time of each step is not substantially larger than the sum of these overheads, so the latency is significantly increased by them. Further, the memory controller may be running from a clock of a different frequency and/or phase from the DRAM subsystem clock. Crossing the boundaries between these clocks requires a time proportional to the clock frequencies. In addition, the architecture must take into account jitter that occurs when various data queues are clocked.
In general, the synchronous nature of the SDRAM architecture gives SDRAMs higher transfer rates than EDO DRAMs. However, the higher rates are achieved at the expense of increased latency and power consumption. Specifically, the time required to clock control and address data through various pipeline stages increases the delay between when an address for a read operation is transmitted and when the data from the specified address is actually supplied by the SDRAM.
The increased overhead (OV) that results from the use of synchronous transfer rather than an asynchronous transfer can be expressed by the formula OV=SD+(TDCxe2x88x92D1)+(TDCxe2x88x92D3)+(TDCxe2x88x92(D2 MOD TDC)), SD is synchronization delay, TDC is the time period of the DRAM clock, D1 is the delay due to controller-to-DRAM time of flight, D2 is the time to perform a CAS operation, D3 is the delay due to DRAM-to-controller time of flight, and (D2MOD TDC) is the remainder of (D2/TDC). SD is typically equal to (TDC+TCC), where TCC is the duration of the controller clock cycle. In a system in which the external clock is at 66 Mhz and the DRAM subsystem clock is at 83 Mhz, typical values may be: TDC is 12 ns, TCC is 15 ns, D1 is 6 ns, D2 is 35 ns, and D3 is 6 ns. Thus, a typical OV would be (15+12)+(12xe2x88x926)+(12xe2x88x926)+(12xe2x88x9211)=40 ns.
Further, systems that use SDRAMs typically consume more power than the systems that use EDO DRAMs because, when the clock is enabled, the SDRAM interface is clocked whether or not a data transfer operation is actually being performed. For example, under typical conditions SDRAMs in an idle state consume approximately two to ten times more energy than EDO DRAMs in an idle state. When the clock is disabled, the clock must be enabled before a data transfer operation can be performed. More specifically, the clock must be enabled before any address or control information can be sampled by the SDRAM. The time used to enable the clock signal further increases the delay between the time that data is desired and the time that the requested data is available.
One object of the invention is to provide a memory system with an improved balance between request-to-data latency, power consumption and bandwidth.
According to one aspect of the invention, a memory interface is provided that maintains the high-bandwidth of synchronous systems, while reducing the latency and power requirements of these systems. This is accomplished by using an asynchronous interface for the address and control information, and using a synchronous interface for fast data transport.
According to one aspect of the invention, a controller transmits control signals requesting a data transfer to a memory device. The memory device asynchronously receives the control signals and synchronously performs the requested data transfer.
The memory device has a first mode in which data transfer circuits within the memory device are not driven by an internal clock signal. The memory device has a second mode in which data transfer circuits within the memory device are driven by the internal clock signal.
The memory device asynchronously receives the control signals. If the memory device is in the first mode, the memory device may assume the second mode in response to one or more of the control signals from a memory controller. While in the second mode, the memory device transfers data with the data transfer circuits while the data transfer circuits are being driven by the internal clock signal. The memory device is also able to asynchronously perform data transfers while the memory device is in the first mode.
The internal clock signal is generated from an external clock signal that may selectively pass through a delay lock loop within the memory device. The memory device may support higher clock frequencies when the external clock signal passes through the delay lock loop to drive the data transfer circuits during a data transfer. Energy may be saved by circumventing the delay lock loop and using an external clock signal with a relatively slower frequency to drive the data transfer circuits during a data transfer.