Main memory 10 (FIG. 1A) for a conventional computer is normally implemented by one or more dynamic random access memories (abbreviated as xe2x80x9cDRAMsxe2x80x9d) that are coupled by a memory bus 11 to an interface circuit 12 (implemented by a xe2x80x9cnorth bridge chipxe2x80x9d) that in turn is coupled to a central processing unit (CPU) 13. Interface circuit 12 is typically coupled to a system bus 14 (such as a PCI bus) that may be coupled to other devices (not shown).
Certain CPUs that require main memory to support a bandwidth of at least 500 Mbytes/s can use a specific type of DRAM called xe2x80x9cDirect RDRAM.xe2x80x9d A main memory 10, when implemented with a Direct RDRAM, requires interface circuit 12 to include a specific circuit called xe2x80x9cRambus Access Cellxe2x80x9d (abbreviated as RAC) 15 (FIG. 1A) that supplies commands as well as row and column addresses to the Direct RDRAM. One example of a conventional Direct RDRAM includes sixteen memory banks 0-15 and seventeen sense amplifiers (abbreviated as xe2x80x9csense ampsxe2x80x9d) S00-S15 (FIG. 1B). Sense amplifiers S00-S15 temporarily hold the data to be transferred to/from banks 0-15. For example, a sense amp S01 that is shared between adjacent banks 0 and 1 holds data to/from either of banks 0 and 1. Due to such sharing of sense amps, two adjacent banks (e.g. bank 0 and bank 1) cannot be accessed simultaneously in the Direct RDRAM.
This limitation on the simultaneous access of adjacent banks is described in a data sheet entitled xe2x80x9cDirect RDRAM(trademark) 64/72-Mbit (256Kxc3x9716/18xc3x9716d),xe2x80x9d available from RAMBUS Inc., 2465 Latham Street, Mountain View, Calif., USA 94040 that is incorporated by reference herein in its entirety. In an example wherein the two transactions have the same device and bank addresses, but different row addresses, the data sheet states that xe2x80x9c[t]ransaction b may not be started until transaction a has finished. However, transactions to other banks or other devices may be issued during transaction a.xe2x80x9d The data sheet further states that the second transaction xe2x80x9cmust occur a time trc or more afterxe2x80x9d the first transaction. See the last paragraph in the second column of each of pages 20 and 21.
Conventional use of Direct RDRAMs in computers is described in an article entitled xe2x80x9cDIRECT RAMBUS TECHNOLOGY: The New Main Memory Standard,xe2x80x9d by Richard Crisp, IEEE Micro, November/December, 1997, pages 18-28 that is also incorporated by reference herein in its entirety. According to the just-described article, such xe2x80x9c[d]irect RDRAMs avoid the empty time slots, or xe2x80x98bubbles,xe2x80x99 that frequently occur in single clocked SDRAM systems. Bubbles result from inadequate control bandwidth necessary to support page manipulation and scheduling while transferring data to and from random locations. Doubled data rate schemes only aggravate the bubble problem.xe2x80x9d Id at page 22.
The article further states that xe2x80x9c[u]sers can schedule the data resulting from the row operation to appear immediately after the column operation completes. This highly interleaved condition greatly improves the efficiency of the channel. This interleaving can only happen when the requests target different banks in either the same Direct RDRAM or a different RDRAM on the channel. The more banks in a system, the better the chances are that any two requests are mapped to different banks. The more interleaving that is possible, the more the memory system performance improves. The Direct RDRAM""s memory array is divided into banks. . . . all 64-Mbit Direct RDRAMs in development have 16 banks with a page size of 1 Kbyte.xe2x80x9d Id at page 23.
The article also states that xe2x80x9c[b]ecause a Direct RDRAM spans the entire channel, the CPU accesses each RDRAM independently. So each RDRAM directly adds to the number of memory banks accessible to the memory controller. . . . Since an RDRAM system has more banks per megabyte than an SDRAM or a DDR system, RDRAM systems boast lower bank conflict rates . . . xe2x80x9d Id.
A scheduler (hereinafter xe2x80x9cmain memory schedulerxe2x80x9d) in accordance with the invention issues requests to main memory in an order different from the order in which the requests are received, in order to minimize bank conflicts. Specifically, the main memory scheduler has a scheduler input port for receiving in a first order (also called xe2x80x9creceived orderxe2x80x9d) requests (also called xe2x80x9cmemory requestsxe2x80x9d) for accessing the main memory (such as a read request, a write request, or a refresh request), and a scheduler output port that is couplable (i.e. capable of being coupled) to the main memory. A main memory scheduler of one embodiment temporarily stores each received memory request (also called xe2x80x9cpending memory requestxe2x80x9d) in a store (called xe2x80x9cmemory request storexe2x80x9d), and issues the pending memory requests at the scheduler output port in an order (also called xe2x80x9csecond orderxe2x80x9d) that is different from the received order.
The main memory scheduler includes, in addition to the just-described memory request store, a multiplexer and a memory request selector that uses the multiplexer to select, for issue to main memory, a pending memory request that avoids a bank conflict. The pending memory requests in the memory request store are checked by the scheduler for bank conflicts with one or more requests that were previously issued and are currently being executed (also called xe2x80x9ccurrently issued requestsxe2x80x9d). Specifically, the main memory scheduler implements a scheme (also called xe2x80x9cbank conflict optimizationxe2x80x9d scheme) by issuing a second request to a second memory bank that is not coincident with (and preferably not adjacent to) a first memory bank (that is being currently accessed). Therefore, a main memory scheduler as described herein can be used to interleave later-received requests among previously-received requests to the same bank or to adjacent banks, wherein adjacent banks share sense amplifiers (such as banks in Direct RDRAMs of the type described above).
Interleaving of accesses to adjacent banks (as described herein) reduces the time period from the time the request is received to the time the request is fulfilled (also called xe2x80x9caccess latencyxe2x80x9d). Also, such interleaving of accesses reduces the number of unused cycles (also called xe2x80x9cbubble cyclesxe2x80x9d) otherwise required to be inserted when accessing adjacent banks successively, thereby improving utilization of the memory bandwidth. Furthermore, interleaving of accesses as described herein allows the interleaved accesses to be issued in accordance with one or more schemes (such as the xe2x80x9cread bypass of writesxe2x80x9d) as described herein, thereby further reducing or eliminating the need for bubble cycles.
The memory request selector includes a bank conflict detector that compares at least a portion (e.g. n bank address bits, when there are a total of 2n banks in the main memory) of a current address signal (i.e. an address signal generated by a currently issued request) with a corresponding portion of one or more (in one implementation all) to-be-issued memory address signals held in the memory request store, to select one or more next address signals that are ready to be issued to main memory. The bank conflict detector selects (via the multiplexer) a next address signal that identifies a memory bank that is not adjacent to and that is not coincident with the memory bank being identified by any current address signal, thereby to minimize bank conflicts. If a bank conflict cannot be avoided by issuing the pending requests in an order different from the received order, the main memory scheduler issues the pending requests in the order of receipt, and inserts bubble cycles in the normal manner.
In one embodiment, in addition to (or instead of, in another embodiment) the just-described bank conflict detector, the memory request selector includes an optimizer that issues read requests prior to issuance of write requests (thereby to give higher priority to read requests in a scheme called xe2x80x9cread bypass of writexe2x80x9d), unless a read request and a write request (also called xe2x80x9cearlier-receivedxe2x80x9d write request) that was received prior to the read request access the same location in main memory. When the just-described two requests access the same location, they are processed in the order of receipt to ensure consistency in the data being written and read. In one particular implementation, the memory request store includes, for each pending memory request, a wait storage element. The scheduler of this embodiment also includes a read interlock logic that stores an active signal in the wait storage element for a later-received read request when an earlier-received write request accesses the same location, thereby to indicate that the read request is to be performed after the write request. On completion of the earlier-received write request, the read interlock logic stores an inactive signal in the wait storage element for the later-received read request, thereby to indicate that the read request is ready to be issued. Use of wait storage elements and read interlock logic as described herein ensures data consistency when using a concurrent access scheme for issuing multiple requests to main memory (so that one or more of the requests are executed simultaneously).
Moreover, in the above-described scheme, a device from which a read request is received (such as a CPU) is not normally stalled by an earlier-received write request, as would be the case in a non-prioritized, first-in-first-out (FIFO) processing of read and write requests. When only write requests are pending, a main memory scheduler of the type described herein performs bank conflict optimization among the pending write requests. Note that stalling can occur even when using the above-described scheduler, e.g. when two requests access the same location as described above.
In four examples, the main memory scheduler performs FIFO processing when (1) there are two pending requests to access the same locations: a write request and a read request, (2) the pending requests are related to configuration, e.g. accessing certain registers in the memory request selector (e.g. to change prioritization in the processing of pending requests), (3) the number of write requests that are pending is greater than a predetermined number, and (4) when a write request has been pending for a predetermined time period. In the third and fourth examples, FIFO processing frees up storage units in the scheduler that hold read requests that have been processed in accordance with xe2x80x9cread bypass of writesxe2x80x9d scheme, and that are located between storage units that hold write requests. Such freed storage units can be used for holding additional requests. FIFO processing can be performed in other situations as well, depending on the specific requirements of a given implementation as discussed herein.
Depending on the embodiment, the optimizer can implement one or more additional schemes for selecting a pending memory request for issue. In one request selection scheme (also called xe2x80x9cdisplay-controllerxe2x80x9d scheme), the memory request selector prioritizes read requests that originate from a predetermined device, such as display controller, ahead of requests from other devices thereby to ensure that the display controller is not stalled by earlier issued read requests (e.g. from the CPU). In another request selection scheme (also called xe2x80x9chardware requestxe2x80x9d scheme), the memory request selector selects, for issue to the main memory, a request that relates to hardware management (such as a refresh request for DRAM or a current control request) prior to selection of a read request or a write request even if such a hardware request was most recently received, thereby to prioritize the hardware request ahead of the read and write requests.