Not applicable.
1. Field of the Invention
The present invention generally relates to a computer system that includes one or more random access memory (xe2x80x9cRAMxe2x80x9d) devices. More particularly, the invention relates to a computer system with a memory controller that reorders read and write requests for improved memory system performance. Still, more particularly, the invention relates to a memory controller that receives read and write requests in an arbitrary order and streams groups of reads and groups of writes to the memory system.
2. Background of the Invention
Superscalar processors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. On the other hand, superpipelined processor designs divide instruction execution into a large number of subtasks which can be performed quickly, and assign pipeline stages to each subtask. By overlapping the execution of many instructions within the pipeline, superpipelined processors attempt to achieve high performance.
Superscalar processors demand low memory latency due to the number of instructions attempting concurrent execution and due to the increasing clock frequency (i.e., shortening clock cycle) employed by the processors. Many of the instructions include memory operations to fetch (xe2x80x9creadxe2x80x9d) and update (xe2x80x9cwritexe2x80x9d) memory operands. The memory operands must be fetched from or conveyed to memory, and each instruction must originally be fetched from memory as well. Similarly, processors that are superpipelined demand low memory latency because of the high clock frequency employed by these processors and the attempt to begin execution of a new instruction each clock cycle. It is noted that a given processor design may employ both superscalar and superpipelined techniques in an attempt to achieve the highest possible performance characteristics.
Processors are often configured into computer systems that have a relatively large and slow main memory. Typically, multiple random access memory (xe2x80x9cRAMxe2x80x9d) modules comprise the main memory system. The RAM modules may be Dynamic Random Access Memory (xe2x80x9cDRAMxe2x80x9d) modules or RAMbus(trademark) Inline Memory Modules (xe2x80x9cRIMMxe2x80x9d) that incorporate a DRAM core (see xe2x80x9cRAMBUS Preliminary Information Direct RDRAM(trademark)xe2x80x9d, Document DL0060 Version 1.01; xe2x80x9cDirect Rambus(trademark) RIMM(trademark) Module Specification Version 1.0xe2x80x9d, Document SL-0006-100; xe2x80x9cRambus(copyright) RIMM(trademark) Module (with 128/144 Mb RDRAMs)xe2x80x9d Document DL00084 Version 1.1, all of which are incorporated by reference herein). The large main memory provides storage for a large number of instructions and/or a large amount of data for use by the processor, providing faster access to the instructions and/or data than may be achieved for example from a disk storage. However, the access times of modern RAMs are significantly longer than the clock cycle length of modem processors. The memory access time for each set of bytes being transferred to the processor is therefore long. Accordingly, the main memory system is not a low latency system. Processor performance may suffer due to high memory latency.
Many types of RAMs employ a xe2x80x9cpage modexe2x80x9d which allows for memory latency to be decreased for transfers within the same xe2x80x9cpagexe2x80x9d. Generally, RAMs comprise memory arranged into rows and columns of storage. A first portion of the address identifying the desired data/instructions is used to select one of the rows (the xe2x80x9crow addressxe2x80x9d), and a second portion of the address is used to select one of the columns (the xe2x80x9ccolumn addressxe2x80x9d). One or more bytes residing at the selected row and columns are provided as output of the RAM. Typically, the row address is provided to the RAM first, and the selected row is placed into a temporary sense amplifier buffer within the RAM. The row of data that is stored in the RAM""s sense amplifier is referred to as a page. Thus, addresses having the same row address are said to be in the same page. Subsequent to the selected row being placed into the sense amplifier buffer, the column address is provided and the selected data is output from the RAM. A page hit occurs if the next address to access the RAM is within the same row stored in the sense amplifier buffer. Thus, the next access may be preformed by providing the column portion of the address only, omitting the row address transmission. The next access to a different column may therefore be performed with lower latency, saving the time required for transmitting the row address because the page corresponding to the row has already been activated. The size of a page is dependent upon the number of columns within the row. The row, or page, stored in the sense amplifier within the RAM is referred to as an xe2x80x9copen pagexe2x80x9d, since accesses within the open page can be performed by transmitting the column portion of the address only.
Unfortunately, the first access to a given page generally does not occur to an open page, thereby incurring a higher memory latency. Even further, the first access may experience a page miss. A page miss can occur if the sense amplifier has another particular page open, and the particular page must first be closed before opening the page containing the current access. A page miss can also occur if the sense amplifier is empty. Often, this first access is critical to maintaining performance in the processors within the computer system, as the data/instructions are immediately needed to satisfy a miss. Instruction execution may stall because of the page miss while the page containing the current access is being opened.
A memory controller in the processor contains a page table that tracks open pages. Read and write requests to main memory from the processor access the page table to determine whether the page needed for the write request is open. If the page is open, the read or write request is placed into a read/write queue and then issued to the main memory. Because read requests and write requests appear in a random fashion sequentially throughout the queue, the memory controller is unable to stream groups of read requests and groups of write requests to the main memory. Thus, requests to main memory are typically intermixed reads and writes.
Intermixing read requests and write requests to main memory causes a clock cycle to be xe2x80x9clostxe2x80x9d when a memory write is immediately followed by a memory read. This phenomenon, called bus turnaround, occurs because the structure of RAM requires an extra clock cycle to make sure that all of the data is written into the memory before a read operation can be performed. For example, if a write operation is followed by a read operation from the same address, a lost clock cycle is needed so that the xe2x80x9cnewxe2x80x9d data will be written to the specified address before the read operation is performed on the data stored at the same address. In systems where bus turnaround occurs frequently, the lost clock cycles on bus turnaround can significantly reduce the bandwidth of the system.
Lost clock cycles due to bus turnaround are common in memory controllers that use a unified read/write queue because read requests and write requests issued from the unified queue to main memory are randomly intermixed. Thus, a system and method is needed that is able to stream groups of read requests and groups of write requests to main memory and reduce lost clock cycles from bus turnaround.
The problems noted above are solved in large part by a computer system that contains one or more processors, each processor including a memory controller containing a page table, the page table organized into a plurality of rows with each row able to store an address of an open memory page. The memory controller also contains a precharge queue, a Row-address-select (xe2x80x9cRASxe2x80x9d) queue, a Column-address-select (xe2x80x9cCASxe2x80x9d) Read queue, and a CAS Write queue. The CAS Read queue and CAS Write queue outputs are connected to a 2-to-1multiplexer. The 2-to-1 multiplexer streams groups of read requests and groups of write requests to main memory resulting in fewer lost clock cycles caused by bus turnarounds. A RIMM module containing RDRAM devices is coupled to the processor, each RDRAM containing a plurality of memory banks.
The memory controller places system memory read requests into the CAS Read queue and system memory write requests into the CAS Write queue. The memory controller places read requests or write requests to system memory in the precharge queue resulting from a page miss caused by the address of an open memory page occupying the same row of the page table as the address of the system memory access resulting in the page miss, each entry in the precharge queue closing the page in the memory bank referenced by the address stored in the page table row. The memory controller also places read requests or write requests to system memory in the RAS queue resulting from a page miss caused by a row of the page table not containing any open memory page address, the entry in the RAS queue activating the page from the memory bank that caused the page miss and storing the page address into the row of the page table not containing any open memory page address to indicate that the page is open. An entry in the precharge queue after completion is then placed into the RAS queue. An entry in the RAS queue after completion is placed into the CAS Read queue or CAS Write queue.