1. Field of the Invention
This invention relates generally to memory systems and memory interconnections in electronic systems. More particularly, the invention relates to high speed interconnection of daisy-chained memory chips.
2. Description of the Related Art
Modern computer systems typically are configured with a large amount of memory in order to provide data and instructions to one or more processors in the computer systems.
Historically, processor speeds have increased more rapidly than memory access times to large portions of memory, in particular, DRAM memory (Dynamic Random Access Memory). Memory hierarchies have been constructed to reduce the performance mismatches between processors and memory. For example, most modern processors are constructed having an L1 (level 1) cache, constructed of SRAM (Static Random Access Memory) on a processor semiconductor chip. L1 cache is very fast, providing reads and writes in only one, or several cycles of the processor. However, L1 caches, while very fast, are also quite small, perhaps 64 KB (Kilobytes) to 256 KB. An L2 (Level 2) cache is often also implemented on the processor chip. L2 cache is typically also constructed of SRAM design, although some processors utilize DRAM design. The L2 cache is typically several times larger in number of bytes than the L1 cache, but is slower to read or write. Some modern processor chips also contain an L3 (Level 3) cache. L3 cache is capable of holding several times more data than the L2 cache. L3 cache is sometimes constructed with DRAM design. L3 cache in some computer systems is implemented on a separate chip or chips from the processor, and is coupled to the processor with wiring on a printed wiring board (PWB) or a multi chip module (MCM). Main memory of the computer system is typically large, often many GB (gigabytes) and is typically implemented in DRAM.
Main memory is typically coupled to a processor with a memory controller. The memory controller receives load (read) commands and store (write) commands from the processor and services those commands, reading data from main memory or writing data to main memory. Typically, the memory controller has one or more queues (e.g., read queues and write queues). The read queues and write queues buffer information (e.g., commands, addresses, data) so that the processor can have multiple read and/or write requests in progress at a given time.
In various implementations, signaling between the memory controller and the memory chips comprise multidrop connections. That is, a pin on the memory controller connects directly to a plurality of memory chip pins (e.g., DRAM chip input or output or common I/O connection) It will be understood that typically one memory chip is placed on one module, so the connection to a particular memory chip includes a module pin plus the chip pin. Occasionally, several memory chips are placed on a single module which creates multiple drops even on a single module.
Another approach uses point to point interconnections between the memory controller and a buffer chip, the buffer chip being associated with a number of memory chips and accessing (writing/reading) to/from those associated chips when the buffer chip receives an address on the point to point interconnect from the memory controller. If the address received does not address the memory chips associated with the buffer chip, the buffer chip re-drives the command/address, and perhaps data, to another buffer chip.
FIG. 1 illustrates such a prior art memory structure. Memory controller 12 is coupled to a first point to point interconnection 18A, comprising “M” bits to a first buffer chip 20A. First point to point interconnection 18A carries address and command information. Memory controller 12 is coupled to a second point to point interconnection 19A, comprising “N” bits, to the first buffer chip 20A. Buffer chip 20A is mounted on a carrier 16A. Also shown mounted on carrier 16A are eight memory chips 14. Buffer chip 20A, as described above, receives address and command information on first point to point interconnect 18A. If buffer chip 20A determines that the address received addresses data in an address space of carrier 16A, buffer chip 20A drives address and control information on multidrop interconnection 21A. Data is typically sent on multiple, point to point interconnections between buffer chip 20A and memory chips 14 as shown on point to point connections 22 (four such point to point connections are referenced with numeral 22, for simplicity, others are not explicitly referenced). If, however, buffer chip 20A determines that the address received on first point to point interconnect 18A does not address the address space of carrier 16A, buffer chip 20A retransmits the address and command on point to point interconnect 18B to a second buffer chip 20B. Buffer chip 20B is mounted on carrier 16B and is coupled to memory chips 14 on carrier 16B. If buffer chip 20B determines that the address is not for the address space of carrier 16B, buffer chip 20B further re-drives the address and command on point to point interconnect 18C to a third buffer chip (not shown). If buffer chip 20B determines that the address is for the address space of carrier 16B, buffer chip 20B drives address and control information on multidrop interconnection 21B.
Data is sent, as described above, on point to point interconnections 22 between buffer chip 20B and memory chips 14 on carrier 16B (as before, four point to point connections 22 shown referenced). Thus, the address and command data is “daisy-chained” from one buffer chip 20 to another, with the appropriate buffer chip reading or writing data from/onto point to point interconnects 19 (shown as 19A-19C in FIG. 1). A problem with this approach is that buffer chips are required. Buffer chip 20 takes up area on carrier 16, and dissipates power. In electronic packaging and system design, area and power consumption are typically desired to be minimized. Buffer chips also add cost to a memory system. Yet another problem in this implementation is that a first period of time (one or more cycles) is used to drive the address and command to a buffer chip and a second period of time (one or more cycles) is then used to drive the address on a carrier (e.g., carrier 16). Driving signals on carrier interconnect, such as copper wiring on a printed wiring board (PWB) requires significant area on the buffer chip for the off chip driver, and associated ESD (electrostatic discharge) circuitry. Ensuring that the chip—module—carrier—module—chip path is operational, and providing for diagnosis of faulty signaling paths, also often requires that some or all pins be driven by a common I/O circuit that can both drive and receive, thus increasing the size and complexity of the circuitry that drives (or receives).
Computer systems typically are designed for a workload mixture and are usually not optimized for a particular workload. For example, a particular computer system may mainly be tasked to run a numerically intensive workload whereas a second computer system of the same design may mainly be tasked to run a commercial database workload. Memory bandwidth requirements often differ between applications. For example, the numerically intensive workload tends to have approximately the same number of reads from memory as writes to memory. In contrast, many commercial database applications require many more reads from memory as writes to memory. Current memory chips have a fixed, bidirectional, data bus interface and can not be optimized to an actual workload on a particular computer system. Further, current memory systems in computer systems do not allow memory accesses on each memory chip to proceed as quickly as possible on each memory chip, but, rather, are timed to a worst case memory chip.
Therefore, there is a need for further improvement in a fast and efficient memory system.