The present invention relates to accessing memory, and more particularly to reducing latency and increasing bandwidth while accessing memory.
Dynamic Random Access Memories (DRAMs) have long been a popular choice for use as main memory in computer systems, especially for low cost computer systems such as personal computers (PCs) and workstations. This is largely because DRAMs use a simple memory cell geometry that permits implementation of large memory arrays at minimum cost and power consumption on a single semiconductor chip.
However, as processor speeds increase beyond a certain point, DRAM technology has been found to have significant access time incompatibilities. This is because the switching speed within a conventional DRAM memory cell is not as fast as the switching speeds now common in central processing units (CPUs). As a result, when using high speed processors with conventional DRAMs, the processor must frequently wait for memory accesses to be completed.
In a DRAM, all of the cells in a given group of memory locations, or a so-called xe2x80x9crow,xe2x80x9d are activated at the same time. Multiple read or write operations can thus be performed with various cells within the row, but only while it is active. If a new access is to be made to a different row, a precharge operation must be completed to close the presently active row then an activate operation must be performed to a different row.
Therefore, a delay equal to the precharge time and activate time is experienced whenever a different row must be accessed on a subsequent transaction. However, the precharge operation is only necessary if the row address changes; if the row address does not change on the subsequent access, the precharge operation has been unnecessarily executed and the device unnecessarily placed in an idle state.
A new type of DRAM, called a synchronous DRAM (SDRAM), is rapidly becoming a popular option for use as main memory. SDRAMs use the same memory cell technology as DRAMs, which is to say they use a single complimentary metal-oxide-semiconductor (CMOS) transistor switch coupled to a storage capacitor. There are, however, several differences in the internal structure of an SDRAM that provide certain speed advantages.
The first such difference is that the operation of an SDRAM is synchronous. In particular, read/write access and refresh cycles occur synchronously with a master clock signal. Therefore, a computer system can be designed using SDRAMs, knowing the exact timing of events within the memory.
Second, being synchronous, SDRAM arrays can be split into two or more independent memory banks, and two or more rows can therefore be active simultaneously, with one open row per independent bank. If a computer system is designed to support interleaved accesses to multiple rows, SDRAMs make it possible to complete these accesses without intervening precharge and activate operations, provided that the rows to be accessed are all in separate SDRAM banks.
In use, an SDRAM may be accessed by multiple components such as a central processing unit (CPU), display refresh module, graphics unit, etc. Different components are given varying levels of priority based on the effect of latency on the component. For example, a display refresh module may be given a higher priority in accessing the SDRAM since any latency may result in easily-noticed, detrimental visual effects.
FIG. 1A illustrates a prior art system 100 by which commands for the read/write, activate and precharge operations may be sent to the SDRAM, in accordance with the prior art. As shown in FIG. 1A, a first queue 101 is provided for queuing the read/write commands. As indicated earlier, such read/write commands may be associated with different banks. Also provided is a second queue 102 for queuing the activate and precharge commands. The output of the first queue 101 and the second queue 102 are then sent to a multiplexer 104 which, in turn, feeds the commands to a SDRAM 106 for carrying out the operations set forth hereinabove.
FIG. 1A-1 illustrates a timing diagram 108 associated with the read/write, activate and precharge commands that are sent to the SDRAM 106. In use, read/write commands may be queued serially for reading data from and writing data to various banks of the SDRAM 106. As shown, the precharge and activate commands for a first bank 110 are queued followed by precharge and activate commands for a second bank 112. It is important to note that the timing associated with the loading of the precharge and activate commands must be handled in a strict serial manner so that each of the appropriate banks are prepared for the corresponding read/write commands in the first queue 101 of FIG. 1A.
Due to the fact that the precharge and activate commands are loaded from a single queue 101, the prior art system 100 must finish loading the precharge and activate commands for the first bank 110 before loading the precharge and activate commands for the second bank 112. This inherently increases the latency and reduces bandwidth associated with memory accesses to the SDRAM 106.
An example of such problem will now be set forth. In conventional prior art computer systems, it is important for CPU traffic to have the lowest latency possible since it is typically stalled when waiting on the fulfillment of read commands. On the other hand, bandwidth, not latency, is important to graphics-related computer components. To efficiently use memory such as DRAMs and SDRAMs, it is important have the target bank opened to the correct row prior to the read/write operation. If the bank is not open, it must be activated to the target row. If the bank is opened to a different row, it first must be precharged and subsequently activated to the target row.
As is observed in prior art system 100 of FIG. 1A, the read/write commands are delayed in a queue while the bank is being prepared using precharge and activate commands. In this way, read/write commands to a previous bank are executed while preparation of the next target bank takes place. Unfortunately, this adds latency to the CPU read access path because previous references from other requestors must be executed before any CPU request.
There is thus a need for a memory controller that exhibits lower latency and higher bandwidth.
A memory controller system is provided including a plurality of memory controller subsystems each coupled between memory and one of a plurality of computer components. Each memory controller subsystem includes at least one queue for managing pages in the memory. In use, each memory controller subsystem is capable of being loaded from the associated computer component independent of the state of the memory.
In one embodiment of the present invention, each memory controller subsystem includes at least one read/write queue with an input coupled to one of the computer components and an output coupled to the memory for queuing read commands and write commands to be sent to the memory. Next provided is at least one precharge/activate queue with an input coupled to one of the computer components and an output coupled to the memory for queuing precharge and activate commands to be sent to the memory.
In one aspect of the present embodiment, the precharge/activate queue may include a precharge queue with an input coupled to one of the computer components and an output coupled to the memory for queuing precharge commands to be sent to the memory, and an activate queue with an input coupled to one of the computer components and an output coupled to the memory for queuing activate commands to be sent to the memory.
In another aspect of the present embodiment, the memory may include dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), and/or dual data rate (DDR) memory. Further, the aforementioned computer components may include a central processing unit, a display refresh module, and/or a graphics unit.
In yet another aspect of the present embodiment, the commands may be loaded in at least one of the queues of each memory controller subsystem based on rows and banks of references in at least one of the queues. Further, the loading of the commands may be delayed based on the rows and banks of references. One optional rule associated with the present embodiment includes requiring each read/write queue to load commands for only a single row in each bank.
In still yet another aspect of the present embodiment, the precharge commands in the precharge queue and the activate commands in the activate queue may be capable of being restored to a row and a bank associated with the read commands and write commands at a head of the read/write queue.
In another embodiment of the present invention, a controller may be provided which is capable of receiving a plurality of read commands, write commands, precharge commands and activate commands from the queues. As mentioned earlier, such queues are capable of being loaded from a plurality of computer components independent of the state of the memory. In operation, the delivery of the read commands, write commands, precharge commands and activate commands from the queues to the memory may be arbitrated by the controller. Thereafter, the arbitrated read commands, write commands, precharge commands and activate commands may be delivered to the memory.
In one aspect of the present embodiment, the delivery of the read commands, write commands, precharge commands and activate commands from the queues to the memory may be arbitrated utilizing a timer. As an option, the timer may arbitrate the delivery of the commands to ensure that sequential commands are delivered sequentially.
In another aspect of the present embodiment, the delivery of the read commands, write commands, precharge commands and activate commands from the queues to the memory may be arbitrated based on a predetermined order. Such predetermined order may prioritize the computer components and/or the read commands, the write commands, the precharge commands and the activate commands.
In yet another aspect of the present embodiment, the delivery of the commands may be arbitrated based on a bank and a row at a head of the queues. Further, the delivery of the commands may be arbitrated based on the read commands and write commands. Also, the delivery of the commands may be arbitrated based on computer component bandwidth sharing.
As an option, the memory controller may be capable of determining if banks specified by the precharge queue and the activate queue are activated or precharged. As such, the precharge command queued in the precharge queue may be disregarded (i.e. unloaded) if the corresponding bank of the memory is determined to be precharged. Further, the activate command queued in the activate queue may be disregarded (i.e. unloaded) if the corresponding bank of the memory is determined to be activated to the corresponding row. Moreover, the read commands, write commands, precharge commands and activate commands may be arbitrated based on the state of the memory.
Since high bandwidth and low latency are conflicting requirements in high performance memory systems, the present invention separates references from various computer components into multiple command streams. Each stream thus can hide precharge and activate bank preparation commands within its own stream for maximum bandwidth. Were these streams to be mixed, a high priority request would be serialized behind outstanding low priority requests.
While separating the multiple command streams may create a problem managing bank state of the memory for look ahead precharge and activate preparation, a page context switch technique may be employed that allows instantaneous switching from one look ahead stream to another to allow low latency while preserving maximum bank state from the previous stream.