The performance of computer systems, especially personal computers, has improved dramatically due to the rapid growth in computer architecture design and in particular to the performance of computer memory.
Computer processors and memories, however, have not pursued the same pace of development through the years. Memories are generally not able to deliver enough response speed to processors. Different approaches have been taken to reduce the gap in speed between the processors and memories. One such approach is the concept of memory hierarchy. A memory hierarchy comprises a number of different memory levels, sizes and speeds. Small amounts of fast cache memory, usually static random access memory (SRAM), are utilized in or near the processor for data that is frequently accessed, such as program instructions. The cache memory reduces the need to access main memory by temporarily storing this frequently accessed data. More space-efficient, but slower, dynamic random access memory (DRAM) can then be utilized downstream from the cache memory. This approach has been augmented by the combination of some sort of cache memory and main memory in a single memory device.
Another approach is to improve the internal response time of the memory itself. At one time, the most common version of DRAM was Fast Page Mode (FPM) DRAM. The capabilities of FPM DRAM lag far behind today's processor speeds. Extended Data-Out (EDO) DRAM was an improvement on FPM DRAM, improving page read cycle times. The primary differences between EDO DRAM and FPM DRAM is that EDO DRAM does not turn off the output drivers when CAS# (column address strobe complement) goes HIGH, and data is valid on the falling edge of CAS# such that the edge can be used to strobe data.
Synchronous DRAM (SDRAM) was a further improvement to dynamic memory devices. SDRAM added a clocked synchronous interface, multiple internal bank arrays and programmable burst inputs and outputs. Double Data Rate (DDR) DRAM allowed data clocking on both clock edges and added a return clock. Despite these advances, SDRAM and DDR are still less than optimal to support current computer processors.
One effort to increase the capabilities of DRAM is SyncLink Dynamic Random Access Memory (SLDRAM). SLDRAM is designed to be a general purpose high performance DRAM and the protocol is targeted to be formalized as an open standard by IEEE (Institute of Electrical and Electronics Engineers, Inc.). As of the date of filing, the latest revision of the proposed IEEE standard is draft 0.99 of IEEE P1596.7-199X, Draft Standard for a High-Speed Memory Interface(SyncLink), dated Oct. 14, 1996.
FIGS. 1A and 1B combined are a functional block diagram of an existing memory device 10 incorporating the features previously described. The memory device 10 is depicted as a 144M SLDRAM (i.e., an SLDRAM having 144.times.2.sup.20 bits of memory), although the discussion is generally applicable to other sizes, configurations and types of memory. For additional background on SLDRAM of the type depicted in FIGS. 1A and 1B, please refer to the SLDRAM, Inc. document CORP400.P65, SLD4M18DR400, 4 MEG.times.18 SLDRAM, revision Jul. 9, 1998, which is incorporated herein by reference.
The memory device 10 includes bank memory arrays 22 which contain memory cells organized in rows and columns for storing data. Bank memory arrays 22 are depicted as eight bank memory arrays, bank0 through bank7. In memory device 10, each bank memory array 22 is organized internally as 2048 rows by 128 columns by 72 bits. Those skilled in the art will recognize that different choices for the number of banks, rows and columns, and the bit width, are possible without altering the fundamental operation of the memory devices described herein.
An external differential command clock (CCLK and CCLK#) signal is provided to clock dividers and delays 20 to generate clock signals ICLK (internal command clock), RCLK (read clock), WCLK (write clock) and other internal clock signals. Command input signals are effectively sampled at each crossing of internally delayed versions of CCLK/CCLK#.
A FLAG signal is supplied to command and address capture 24 to indicate that a valid request packet is available on pins CA0-CA9. Pins CA0-CA9 supply the address and command bits and may collectively be referred to as the command link. Command decoder and sequencer 26 acts to place the control logic in a particular command operation sequence according to the request packet received at command and address capture 24. Command decoder and sequencer 26 controls the various circuitry of memory device 10 based on decoded commands, such as during controlled reads to or writes from bank memory arrays 22. During write transfer operations, data is supplied to memory device 10 via input/output pins DQ0-DQ17. During read transfer operations, data is clocked out of memory device 10 via input/output pins DQ0-DQ17. The DQ pins can collectively (when looking external of the device) or individually (when looking internal to the device) be referred to as data links. For a read access, differential data clocks (DCLK0/DCLK0# and DCLK1/DCLK1#) are clocked out of memory device 10 via input/output pins DCLK0, DCLK0#, DCLK1 and DCLK1#. For a write access, differential data clocks (DCLK0/DCLK0# and DCLK1/DCLK1#) are driven externally, e.g. by a memory controller (not shown), and provided to memory device 10 via input/output pins DCLK0, DCLK0#, DCLK1 and DCLK1#.
Power-up and initialization functions of the memory device 10 are conducted in the conventional manner. Moreover, refresh functions of the memory device 10 are provided in the known manner employing a refresh counter 38 to refresh the memory arrays.
During a bank access command, address sequencer 28 generates a value representing the address of the selected bank memory array 22, as indicated by bank address bits on input pins CA0-CA9, and latches it in bank address register 44. Address sequencer 28 further generates a value representing a row address of the selected bank memory array 22, as indicated by row address bits on input pins CA0-CA9, and latches it in a row address register 42. Address sequencer 28 still further generates a value representing a column address, as indicated by column address bits on input pins CA0-CA9, and latches it in column select 62.
The latched row address is provided to a row multiplexer 46 which provides a row address to predecoder 48 to be provided to bank row selects 52. In addition, bank address register 44 provides the latched bank address to bank control logic 54 which in turn provides a bank address to bank row selects 52. In response to the bank address and row address, bank row selects 52 activate the desired row of the desired memory bank for processing, to thereby activate the corresponding row of memory cells. Bank row selects 52 generally have a one-tone relationship with bank memory arrays 22.
In the memory device 10 of FIGS. and 1B, column select 62 activates 72 of the 128.times.72 (number of columns x bit width) lines provided to sense amplifiers and I/O gating circuit 66, the number of lines activated corresponding to the bit width of the device. The lines provided to sense amplifiers and I/O gating circuit 66 represent bidirectional data paths. As used herein, paths will generally describe transmission lines internal to a memory device, while links will be used to describe lines or ports generally designed for transmission between a memory device and an external device. Sense amplifiers associated with bank memory arrays 22 operate in a manner known in the art to sense the data stored in the memory cells addressed by the active bank row select line. Activating the column select lines effectively couples, via the I/O gating circuit 66, the selected memory cells to the data paths to/from the input/output pins DQ0-DQ17.
During page read command operations, data is provided to read latch 68 from I/O gating circuit 66 as a 72-bit data word across 72 bidirectional data paths. Multiplexer 70 in turn provides the selected 72 bits of data to read FIFO 72 as a burst of four 18-bit data words, through methods such as time division multiplexing. The four data words are then driven sequentially to input/output pins DQ0-DQ17 by drivers 74. Data into read FIFO 72 is controlled by the RCLK signal generated by clock dividers and delays 20. Data out of read FIFO 72 is controlled by the delayed RCLK signal generated by programmable delay 73. Circuitry thus described and provided between DQ0-DQ17 and I/O gating circuit 66 can collectively be referred to as output circuitry and facilitates unidirectional output data paths to the data links. During page write command operations, data is provided on input/output pins DQ0-DQ17 to receivers 76 as a burst of four 18-bit data words which are individually stored in input registers 78. The four 18-bit words of input write data are provided to write FIFO 80 as a 72-bit data word and latched in write latch and drivers 82. Data into write FIFO 80 is controlled by clock generation 79 in response to external signals DCLK0/DCLK0# and DCLK1/DCLK1#. Data out of write FIFO 80 is controlled by the WCLK signal generated by clock dividers and delays 20. Write latch and drivers 82 provide the 72-bit data word across 72 bidirectional data paths to the selected row of the selected bank memory array 22 with sense amplifiers and I/O gating circuit 66 in a manner known in the art based on the activated 72 lines corresponding to the current column address. Circuitry thus described and provided between DQ0-DQ17 and the I/O gating circuit 66 can collectively be referred to as input circuitry and facilitates unidirectional input data paths from the data links.
It will be observed that data paths internal to a memory device are generally some multiple of the number of DQ pins, the multiplier increasing as the paths lead to the memory arrays.
A difficulty of the memory device 10 of FIGS. 1A and 1B occurs during a transition from a page write to a page read command operation. Memory device 10 requires a significant latency between page write requests and page read requests due to the bidirectional nature of input/output lines DQ0-DQ17 and I/O gating circuit 66.
FIG. 2 depicts a timing diagram of the memory device 10 of FIGS. 1A and 1B in response to various requests. FIG. 2 is based on a time t.sub.0 representing the time of the first request and a scale representing the number of clock ticks from time t.sub.0, where there are two clock ticks for each clock cycle. As shown in FIG. 2, although back-to-back page read requests and back-to-back page write requests can be accommodated, transitions between page write and page read commands require a latency. In the case of transitioning from a page read request to a page write request, this read-to-write latency (t.sub.RWD) represents the time required to allow an external data bus to settle and, in most cases, is one cycle of CCLK or two ticks. For a 200 MHz clock, this represents about 5 ns. In the case of transitioning from a page write request to a page read request, additional latency is required to move the write data from DQ0-DQ17 to the bank memory arrays 22 before moving the read data from the bank memory arrays 22 to DQ0-DQ17. The write-to-read latency (t.sub.WRD) can be significant and is expected to be about nine cycles of CCLK, or 18 ticks, for the memory device 10. For a 200 MHz clock, this represents about 45 ns.
It has been proposed by others that t.sub.WRD) can be reduced by providing separate read and write data paths from the array to the input/output pins. While the improvement is desirable, it comes at a high cost, i.e., a significant penalty in die real estate and cost due to the vast duplication of circuitry and long metal runs.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for more efficient memory structures.