This invention relates to high-speed memory systems and devices, and in particuar to high-speed memory devices that accommodate pipelined memory access operations.
FIG. 1 shows an example of prior art asynchronous memory device 10. Memory device 10 is an asynchronous DRAM (dynamic random access memory) having a memory array 12 that is addressable by the combination of a row address and a column address. The row and column addresses are typically provided during different bus cycles on a common address bus ADDR. A RAS signal indicates a bus cycle in which the row address is supplied, and the CAS signal indicates a bus cycle in which the column address is supplied. Memory results are provided in response to individual column addressesxe2x80x94in response to CAS bus cycles.
The memory device shown in FIG. 1 includes address registers 14 and 15 that hold the row and column addresses during memory access. The RAS and CAS signals, respectively, load the row and column addresses from the address bus into registers 14 and 15.
The CAS signal also loads a command or instruction (write or read) into a command register 16. A command decode block 17 interprets the current memory instruction and enables an appropriate driver 18 or 19, depending on whether the memory operation is a write operation or a read operation.
FIG. 2 shows the CAS timing of a read operation in the memory device of FIG. 1. The rising edge of CAS loads the column address into register 15, loads the read command into register 16, and starts the column access. Actual memory access requires a time tCAC from the leading edge of the CAS signal. The assertion of CAS also turns on the data output driver 18 after a delay of tON. Initially, invalid data (cross-hatched) is driven on the DATA bus. Valid data is driven after the time tCAC and until a time tOFF after CAS is de-asserted.
This access is asynchronous since read data appears on the DATA bus after a time that is determined by the DRAM and not by timing signals supplied externally (other than the initial CAS edge that loads the address). The advantage of this approach is simplicityxe2x80x94it is relatively easy to use this memory device. The disadvantage is performancexe2x80x94the number of read operations per unit of time is relatively limited since accessing the memory array and transporting the resulting data on the DATA bus must be done sequentially before the next access can begin.
FIG. 3 shows pertinent elements of a synchronous DRAM 20, a prior art device having, an architecture that facilitates higher access speeds relative to the asynchronous DRAM described above. DRAM 20 has one or more banks of memory arrays 21. It has row and column address registers 22 and 23 that receive row and column addresses from a common address bus ADDR. DRAM 20 also has a command register 24 that receives and stores commands or instructions from a command or control bus OP. This de vice allows more complex memory access operations that the device of FIG. 1, and therefore allows more commands through its OP bus.
Instead of RAS and CAS signals, this device uses a single CLK signal, in conjunction with the OP bus, to load row and column addresses into registers 22 and 23. The command register 24 is loaded by the CLK signal as well.
Another difference from the circuit of FIG. 1 is that DRAM 20 has registers 25 and 26 in the path of the read and write data (between the DATA bus and the memory arrays 21). These registers are also loaded by the CLK signal. A command decode block 27 generates signals that enable drivers 28 and 29 for the read and write data.
The inclusion of two or more independent banks of memory arrays permits more that one memory access to take place at a time. In other words, a second memory access operation can be initiated even before obtaining results of an earlier operation. Registers 25 and 26, in the path of the read and write data, are necessary for this type of overlapped operation. Such overlapped operation is typically referred to as xe2x80x9cpipelinedxe2x80x9d operation or xe2x80x9cpipelinedxe2x80x9d memory access.
FIG. 4 shows the timing of a column read access for synchronous DRAM 20. On the first rising edge of CLK the column address is loaded from the ADDR bus into column address register 23, and a command is loaded from the OP bus into command register 24. Accessing the appropriate memory array and obtaining memory data requires a time tCAC, which is slightly less than the period of the clock signal CLK. At the next rising edge of CLK, the read data is loaded from the memory array into read data register 25. This CLK edge also turns on the data output driver 28 after a delay of tON. The third rising edge of CLK turns off the data output drivers after a time tOFF.
This operation is synchronous, in that data output is timed and enabled relative to an externally supplied clock signal. The row and column address registers 22 and 23 form a first pipeline stage, in which addresses are obtained for accessing memory. The read data register 25 forms a second pipeline stage, which is capable of holding memory results even as another memory access operation is initiated in the first pipeline stage. As a result of this technique, the two steps of memory access and data transport are done sequentially in the two pipeline stages of the DRAM. A second memory access could be started after the second CLK edge, overlapping the two operations.
There are two benefits to this technique. First, it permits sequential transactions to be overlapped, increasing the number of read transactions per unit of time. Second, it resynchronizes the transport of the read datexe2x80x94the signals that enable and disable the drivers are timed by the subsequent CLK edges.
As the signaling bandwidth of memory buses is increased, more pipeline stages can be added to the DRAM so that individual data slots are very small. Modern memory designs utilize a high degree of pipelining to support very high transfer rates.
Although pipelining has been essential to achieving high memory access rates, the technology does have disadvantages. High latency is one disadvantage, resulting from the need to quantize internal delays to the externally-supplied clock period. A disproportionally high power requirement is another disadvantage. Power is a concern because a free-running clock dissipates power even when no useful work is being done. Some devices utilize low-power modes in which the clock is gated off, but this creates further latency problems. Furthermore, the power needed while restarting the clock threatens to erase whatever savings might have otherwise been gained by disabling the clock.