1. Field of the Invention
The present invention relates to memory system design; and, in particular, the present invention relates to cache memory system design.
2. Background of the Invention
Certain modular memory integrated circuits, which provide both random access memory (RAM) circuits and on-chip logic circuits for managing high-speed access to such RAM circuits, have become available. One example of such integrated circuits is the Rambus.TM. DRAM.sup.1 (RDRAM). FIG. 1a shows a configuration of memory system 100 using RDRAMs. As shown in FIG. 1a, a microprocessor 101 interfaces memory system 100 through address bus 109, data bus 105 and control signal bus 110. Memory system 100 interfaces with microprocessor 101 over a Rambus access controller (RAC) 102. RAC 102 is available either in packaged integrated circuit form or as a standard cell to be incorporated into an application specific integrated circuit which require access to RDRAMs. Of course, RAC 102 and microprocessor 101 can also be incorporated into a single integrated circuit. FNT .sup.1 Rambus DRAMs are available from Rambus Inc., Mountain View, Calif. 94040
RAC 102 controls a memory bus ("primary channel") 105, which is used for reading and writing a number of memory modules. As shown in FIG. 1, bus 105 serves RDRAMs 104-1, 104-2, . . . , 104-m and R-Modules 103-1, 103-2, . . . , 103-m. R-Modules are expansion modules which provide a second level hierarchy in memory system 100. Each R-Module includes a number of RDRAMs on a single secondary channel, which is controlled by a Rambus Transceiver (RTransceiver). The structure and operation of a RTransceiver are known to those skilled in the art. Memorybus clock signal ("Rambus Clock") 106 and system clock ("CPU clock") 107 are provided to RAC 102 to perform necessary synchronization between memory system 100 and microprocessor 101.
FIG. 1b is a block diagram of an RDRAM 104, which includes three "layers" of circuitry: application, logic and physical. In the application level, two banks 130a and 130b of dynamic random access memory (DRAM) are provided, each bank storing 256 kilobytes (9-bit bytes) of data. Banks 130a and 130b are provided row sense amplifier latches 131a and 131b, respectively. Each row sense amplifier latch holds a row (1 kilobytes) of data, which is larger than a row sense amplifier latch in a typical conventional page-mode DRAM. Row sense amplifier latches 131a and 131b serve, respectively, as caches for DRAM banks 130a and 130b. After a specified row is sensed and the data strobed into a row sense amplifier latch, CAS ("column address strobe") operation can select a specified byte within the row to be read or written. Row sense amplifier latches 131a and 131b input data from and output data to an internal 72-bit data bus 139. Row sense amplifier latches 131a and 131b are write-back caches, which are written back before RDRAM 104 processes a read or write cache miss.
The logic layer circuit of RDRAM 104 provides the control operations of RDRAM 104. Control layer circuit includes control circuit 133, and control registers 134. Registers 134 includes a register for specifying configuration and size data of RDRAM 104, a base address register and an address mapping register for mapping RDRAM 104 to the address space of microprocessor 101, and a number of other registers for specifying a number of configuration parameters. Further, mask register 132 is provided for bit-masking operations upon bits on internal data bus 139.
The physical layer circuit of RDRAM 104 provides an interface to primary channel (or a secondary channel, in an R-Module) 105. Primary channel 105 includes a 9-bit data bus 138a, a 1-bit bus control line 138b, and a 1-bit bus enable line 138c. Receiver 135 receives data and control signals from primary channel 105 and places the received data and control signals on internal data bus 139. Likewise, transmitter 137 receives data and control signals from internal data bus 139 and places the data and control signals on primary channel 105. Clock signals 140 and 141 are provided for synchronizing transmitting and receiving operations to the clock signals of RAC 102.
FIG. 1c is a block diagram of RAC 102. As shown in FIG. 1c, RAC 102 includes a microprocessor interface logic circuit 160 for receiving data from and transmitting data to microprocessor 101. For data to be sent to microprocessor 101, microprocessor interface logic circuit 160 retrieves data from receiver buffer 162. A multiplexer 163 passes either address or data from microprocessor interface logic circuit 160 to transmit buffer 161. Transmit and receive buffers 161 and 162 are provided for matching the data transfer rates between primary channel 105 and microprocessor interface logic circuit 160. Logic circuit 164 handles physical layer handshaking on primary channel 105, retrieves data from transmit buffer 161, packages the data thus retrieved into the packet format accepted on primary channel 105, and transmits the formatted data on primary channel 105. Logic circuit 165 handles physical layer handshaking on primary channel 105, receives data from primary channel 105, disassembles the packaged data received from primary channel 105, and stores the data thus obtained in receive buffer 162. Typically, Rambus clock 106 is provided at 4 times the clock rate of system interface logic circuit 160. Thus, a divide-by-4 circuit 166 is provided to step down Rambus clock 106 for timing use in RAC 102. RAC control logic 168 controls the operation of RAC 102. In addition, synchronization logic circuit 169 synchronizes operations with respect to the asynchronous Rambus clock 106 and CPU clock 107.
In a system such as that described in FIGS. 1a-1c, microprocessor 101 reads and writes memory using address bus 109, data bus 108 and control bus 110. In a conventional system, microprocessor 101 treats RAC 102 as a conventional memory controller. In turn, RAC 102 provides packeted data on primary channel 105. RAC 102 sends data packets to RDRAMs 104-1 to 104-n or R-Modules 103-1 to 103-m during read operations. Likewise, RAC 102 receives data packets from RDRAMs 104-1 to 104-n or R-Modules 103-1 to 103-m during write operations.
FIG. 2a shows a request packet 200 sent by microprocessor 101. In this example, clocks 140 and 141 on primary channel 105 has a four-nanosecond period, i.e. two 2-nanosecond data periods following an edge transition. Request packet 200 consists of six 10-bit words 205a-205f, sent over six data periods using 9-bit data bus 138a and control line 138b. In FIG. 2a, the 9-bit portion of each of request packet 200's 10-bit words sent over data bus 138a is indicated by reference numeral 200b, and the 1-bit portion of each of request packet 200's 10-bit words is indicated by reference numeral 200a. According to FIG. 2a, a 36-bit address is packed into request packet 200, using 2-bit field 203a (in 10-bit word 205f), 8-bit field 203b (in 10-bit word 205a), 8-bit field 203c (in 10-bit word 205b), 9-bit field 203d (in 10-bit word 205c) and 9-bit field 203e (in 10-bit word 205d). A 4-bit opcode, included in four 1-bit fields 202a-202d of 10-bit words 205a-205c, is provided to specify the memory access request. An 8-bit constant is specified in 2-bit field 204a of 10-bit word 205e, and two 3-bit fields 204b and 204c of 10-bit words 205e and 205f to indicate the block size to be read or written.
FIG. 2b shows an acknowledgement packet 220. Acknowledgement packet 220 is sent over two 2-nanosecond cycles as two 1-bit words on control line 138b (portion 220a). As shown in FIG. 2b, the acknowledgement message is provided in two 1-bit fields 222a and 222b over two 2-nanosecond cycles of bus clock 141. FIG. 2c shows a data packet 240. 36-bit data is sent to and received from RAC 102 over four 2-nanosecond cycles on four 9-bit words 242a-242d, which are provided consecutively over data bus 138a (portion 240b). As shown in FIG. 2c, the 36-bit data word received or transmitted are provided in four 9-bit fields 241a-241d of words 242a-242d.
As discussed above, each of the two DRAM banks of each RDRAM is cached in a row sense amplifier latch. The protocol utilized by primary channel 105 for a read cache hit, a read cache miss, a write cache hit and a write cache miss are illustrated by FIGS. 3a-3d, respectively. FIG. 3a illustrates a read cache hit. Initially, a read request packet 301 is sent by RAC 102 on data bus 138a and control line 138b. After an idle period 304 of 20 ns, an acknowledgement packet 302 from the addressed RDRAM is provided on control line 138b to indicate a cache hit. Thereafter, following a delay 305 (12 ns) subsequent to receiving acknowledgement packet 302, the addressed RDRAM returns a data packet 303 on data bus 138a. The data packet contains the block of data specified in request packet 301.
FIG. 3b illustrates a read cache miss. Initially, request packet 310 is sent by RAC 102. After a period 311 (20 ns), acknowledgement packet 313 is received on control line 138b indicating a cache miss. As a result, a time-out period 312 is provided by RAC 102 to allow the addressed row to be accessed and cached in the appropriate one of row sense amplifier latches 131a and 131b. During time-out period 312, a different RDRAM can be accessed, beginning 4 ns after acknowledgement packet 313 is received. After time-out period 312 expires, RAC 102 sends a retry request packet 314 on data bus 138a and control line 138b. Thereafter, acknowledgement packet 316 and data packet 318 are provided in the substantially in the same manner as discussed above with respect to a read cache hit.
FIG. 3c illustrates a write hit. Initially, RAC 102 sends request packet 320 to the addressed RDRAM, indicating a write access. After a predetermined period 321 (4 ns) following request packet 320 is sent, RAC 102 begins to transmit on data bus 138a data packet 322, which includes the data to be written. An acknowledgement packet 323 is sent by the addressed RDRAM, in response to request packet 320, to RAC 102 after a predetermined delay (20 ns). In this instance, acknowledgement packet 320 indicates a cache hit. Consequently, RAC 102 completes transmission of data packet 322.
FIG. 3d shows a write miss. Initially, RAC 102 transmits a write request packet 330 to an addressed RDRAM on data bus 138a and control line 138b, and begins to transmit data packet 332 on data bus 138a for writing into the address RDRAM. However, in this instance, acknowledgement packet 323 received from the addressed RDRAM on control line 138b indicates that a write cache miss has occur. Consequently, RAC 102 aborts transmission of data packet 332. A time-out period 334 is introduced, during which access to a different RDRAM is permitted after a predetermined period (4 ns) elapses. Subsequent to time-out period 334, RAC 102 submits a retry request packet 335. RAC 102 transmits data packet 337 following a predetermined period 336 (4 ns). Acknowledgement packet 338 indicates that a cache hit has occurred, thereby signalling RAC 102 that data packet 337 can proceed to complete transmission.
Because a row sense amplifier latch in an RDRAM is a write-back cache, whenever a read cache miss or a write cache miss occurs, any updated data in the row sense amplifier is written back into its DRAM bank before the row sense amplifier latch is refilled. Thus, additional cache miss processing, hence additional access time, is incurred in writing back "dirty" data. This additional cache miss processing is not incurred when the data in the row sense amplifier latch does not include unwritten back data. The difference in cache miss processing times between the "dirty" data situation and the "clean" situation (i.e. no write-back data) can be substantial.
In the prior art, a retry request packet is sent after a delay period defined according to a "worst case" scenario. That is, each cache miss requires data in the row sense amplifier to be written back into the corresponding DRAM. Clearly, for a cache line which has not been updated, this approach results in unjustified degradation of performance.
Since optimum performance requires that a retry request packet be sent immediately after cache miss processing is completed, such performance is only achieved when the retry request packet is sent at the time the refilled data arrives at the row sense amplifier latch of the address DRAM bank. Such refilled data arrival time varies according to whether a write-back is required.